GaetanBenoitDev / metaMDBG

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.
MIT License
93 stars 3 forks source link

Minimum long read accuracy needed for metaMDBG #3

Closed jmtsuji closed 1 year ago

jmtsuji commented 1 year ago

Congratulations on your recent pre-print about metaMDBG! One thing that I was curious about was the definition of "accurate long reads". I could not find an accuracy cutoff in the pre-print. For example, would long reads over Q20 (1% error rate) be acceptable as input into metaMDBG? Or is a more stringent accuracy cutoff (e.g., Q30, 0.1% error rate) needed? I am curious if you have any insights or predictions here. Thanks very much.

GaetanBenoitDev commented 1 year ago

Hi, thank you very much! The software has been designed and tested specifically on PacBio HiFi reads (so it's more like <0.1% error rate without homopolymer errors). We say "accurate long reads" because the sequencing technology is evolving quite fast, and maybe Nanopore reads will be compatible with our graph structure soon. We did some tests on the Nanopore R10.4 kits (they will be included in the next version of the manuscript) but the results are not that good and metaflye is still the way to go for that.

jmtsuji commented 1 year ago

Thanks for the quick reply! It's good to know that the software was designed/tested for PacBio HiFi reads and that Nanopore R10.4 data still works better with metaFlye. Indeed, long read technology is progressing quickly... it would be interesting to test the new Q30+ duplex data coming from Nanopore R10.4.1 flow cells (especially the "high duplex" R10.4.1 flow cells currently under testing), although homopolymer errors are still an issue I think. Anyway, thanks for your work on this interesting tool and for the informative response.