chanzuckerberg / shasta

[MOVED] Moved to paoloshasta/shasta. De novo assembly from Oxford Nanopore reads
Other
270 stars 57 forks source link

Sample command for assembly of reads generated by Guppy v5 #261

Closed apredeus closed 3 years ago

apredeus commented 3 years ago

Hello all,

Just wanted to check if the following command looks reasonable to assemble reads called by Guppy version 5, with "super high accuracy" (Bonito-style models). The 0.7.0 pre-made binary didn't run with --Assembly.consensusCaller Bayesian:guppy5.0.7-a throwing an error about unknown caller. I cloned this repo and in the end came up with something like this for the assembly command:

CFG=<conf-dir>/Nanopore-Jun2020.conf
GPY=<conf-dir>/SimpleBayesianConsensusCaller-10.csv
shasta-Linux-0.7.0 --threads 64 --input $FASTQ --config $CFG  --Reads.minReadLength 2000 --Assembly.consensusCaller Bayesian:$GPY &> shasta.log

Can you comment if this makes sense at all? Read length is limited to 2000 since I have about 50x coverage with median of 13k, so quite a few reads are shorter than the default 10k cutoff.

paoloczi commented 3 years ago

This is a good start, but Nanopore-Sep2020.conf should give better results than Nanopore-Jun2020.conf. Or, if you have a plant genome, try Nanopore-Plants-Apr2021.conf. And if you have the latest code from GitHub it should also accept --Assembly.consensusCaller Bayesian:guppy5.0.7-a, which is equivalent to SimpleBayesianConsensusCaller-10.csv.

If you don't get a satisfactory assembly, please post here AssemblySummary.html plus the expected genome size and I can help tweak assembly parameters.

apredeus commented 3 years ago

Actually turns out I got the best result of all the assemblers I've tried so far, so I'm quite happy with this :) Thank you for the hints, I'll try the newer config file as well.