chanzuckerberg / shasta

[MOVED] Moved to paoloshasta/shasta. De novo assembly from Oxford Nanopore reads
Other
272 stars 59 forks source link

What config should be used for reads basecalled with a mix of guppy versions? #243

Closed gitcruz closed 3 years ago

gitcruz commented 3 years ago

Hi Paolo,

Reading the output logs and the docs, I noticed that is necessary to use a config file in concordance with the basecaller version. For a project with standard reads, i do have Guppy4.2.3 so I am just rerunning the assembly with Nanopore-Sep2020.conf. This is what you recommend for Guppy3.6.0 or greater.

However, for another project I do have a mix of flowcells sequenced at different times:

What would you recommend in such case? Is it reasonable to try assembling the genome with --config Nanopore-Sep2020.conf ?

I know the best thing to do should be re-basecalling all of them with 4.0.11...but this is a bit complicated without modern GPU nodes and it will slow down things.

Thanks, Fernando

paoloczi commented 3 years ago

My recommendation would be to use Nanopore-Dec2019.conf for the flowcells basecalled with Guppy 3.2.6 and 3.2.10, and Nanopore-Sep2020.conf for the flowcells basecalled with Guppy 4.0.11.

There was a huge improvement in Guppy at version 3.6, so your data quality would improve substantially if you were able to redo the base calling for those 5 older flowcells. But I do understand the complexities of that.

gitcruz commented 3 years ago

Hi Paolo, So is it possible to specify a different configuration file for each read file? What would be the syntax?

Do you think something like this would work?

shasta -input reads/FC1_guppy3.2.6.fasta reads/FC2_guppy3.2.10.fasta reads/FC3_guppy4.0.11.fasta ---assemblyDirectory out --config /absolute-path-to/Nanopore-Dec2019.conf /absolute-path-to/Nanopore-Dec2019.conf /absolute-path-to/Nanopore-Sep2020.conf

I definitely think we should rebasecall.

Thanks, F

On Thu, 18 Mar 2021 at 17:44, paoloczi @.***> wrote:

My recommendation would be to use Nanopore-Dec2019.conf for the flowcells basecalled with Guppy 3.2.6 and 3.2.10, and Nanopore-Sep2020.conf for the flowcells basecalled with Guppy 4.0.11.

There was a huge improvement in Guppy at version 3.6, so your data quality would improve substantially if you were able to redo the base calling for those 5 older flowcells. But I do understand the complexities of that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chanzuckerberg/shasta/issues/243#issuecomment-802099800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB34KVM2FCKRDEHVESGGXNTTEIUV5ANCNFSM4ZM6MS6A .

paoloczi commented 3 years ago

Sorry, I misread your question. No, you can't do that. You will have to pick one. Given that, I suggest using Nanopore-Dec2019.conf until you can redo the base calling.

gitcruz commented 3 years ago

Sure, that makes more sense. Thanks, Fernando

El vie., 19 mar. 2021 15:31, paoloczi @.***> escribió:

Sorry, I misread your question. No, you can't do that. You will have to pick one. Given that, I suggest using Nanopore-Dec2019.conf until you can redo the base calling.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chanzuckerberg/shasta/issues/243#issuecomment-802875438, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB34KVMQHPT4VIMDGBPEDRLTENN3VANCNFSM4ZM6MS6A .

paoloczi commented 3 years ago

I am closing this due to lack of discussion, but feel free to reopen it or create a new issue if additional discussion topics emerge.