aquaskyline / SOAPdenovo2

Next generation sequencing reads de novo assembler.
GNU General Public License v3.0
220 stars 78 forks source link

clarify if megahit is usable for mammalian de novo assembly #37

Closed antonkulaga closed 6 years ago

antonkulaga commented 6 years ago

Could you clarify if Megahit is usable also for mammalian de novo assembly?

aquaskyline commented 6 years ago

Yes. Megahit is suitable for mammalian de novo assembly, though it generates only contigs.

antonkulaga commented 6 years ago

@aquaskyline I can do scaffolding with Sealer or something similar. I am choosing a de novo assembler for the project now. At the benchmark done by your competitor, Abyss, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411771/ I noticed that SoapDeNovo2 eats a huge amount of RAM but is more accurate than others. I am curious if Megahit can be treated as "SoapDeNovo2 that eats less memory and assembles mammalian genomes with same or better accuracy" or it is not the case yet?

aquaskyline commented 6 years ago

Although Megahit consumes much less memory than SOAPdenovo2, you cannot use Megahit safely as a substitute to SOAPdenovo2. Although both Megahit and SOAPdenovo2 are suitable for mammalian de novo assembly, the results can be vastly different due to the disparate design rationale. SOAPdenovo2 tends to create more conservative thus shorter contigs to maximize the performance of scaffolding (because short contigs create less contiguity in scaffolding). Megahit makes contigs as its final output, thus creating much longer contigs than SOAPdenovo2. How the two tools really perform depends on the species and dataset, but it's more often the case that SOAPdenovo2 generates longer scaffold N50 than "Megahit+Another scaffolder".

By the way SOAPdenovo2 was known to consume as much or less memory than Abyss v1. You might want to try the option -a to fix a memory size. If you know the peak memory consumption when -a is not used, the best value for -a would be the peak0.66. This option helps you to further decrease the memory needed. Note that there are some risks that peak0.66 is not enough and SOAPdenovo2 runs into Out Of Memory error.