cschin / Peregrine

Peregrine: Fast Genome Assembler Using SHIMMER Index
Other
99 stars 9 forks source link

Add peregrine 'hello world' runbook #5

Open a-khalak opened 5 years ago

a-khalak commented 5 years ago

For those users that are primarily interested in running Peregrine and mainly tweaking hyperparameters, it would be helpful to have a 'hello world' runbook that walks through setting up the required inputs and running through a test case on sample data (e.g. e. coli K12).

Ideally, this would involve confirming system dependencies, obtaining some canned reads (.fasta and .lst files), running peregrine from latest stable dockerhub build, and confirming the resulting assembly.

a-khalak commented 5 years ago

@cschin

Ideally, the hello world case would involve

  1. the user getting a set of sample reads onto their own system
  2. inspecting the reads in terms of how they are set up. From a .fasta or a .lst file, so they can compare to their own data.
  3. running the Peregrine docker, similarly to the production example but on the sample reads
  4. inspecting the results to confirm that they get what they would expect.

The idea is that the user should be pretty far along in terms of being able to self-serve in running their own data by more or less copying the hello world example.

a-khalak commented 5 years ago

Just put the reads in a public location on S3.

Can you see the e. coli K-12 fasta files from the following location (supposedly public)?

asif@dockerbox:~ $ aws s3 ls s3://biologicaldatascience.org/data/ecoli-k12/ 2019-06-02 23:22:38 0 2019-06-02 23:23:21 4706062 K12MG1655.fa

a-khalak commented 5 years ago

This only works if you have an AWS account and cli tools installed. So, the following should work even without anything installed.

https://s3.amazonaws.com//biologicaldatascience.org/data/ecoli-k12/K12MG1655.fa

cschin commented 5 years ago

@sifta I agree that it will be good to have a demonstration script. It will be tricky to package that pithing the docker image now. Let me think about it. One challenge is that users' data can be quite arbitrary and may not work all the time.

huangl07 commented 4 years ago

@cschin

could you show some downstream best practise ?

like polish by illumina reads or connect ctg to scaffolds?

cschin commented 4 years ago

@huangl07 I will move your question to a new issue as it is off topic of the current issue.