ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
763 stars 139 forks source link

Ideas for Assembling an Extremely Large Dataset #1373

Open howla1ke opened 2 months ago

howla1ke commented 2 months ago

Hello, I have NovaSeq 150 bp PE data, that was run on 2 separate runs to obtain the quantity of data we needed. I want to co-assemble both of these, but my dilemma is that I can only allocate 996 GB of RAM. My job was killed because it ran out of memory and it was noted it the spades log that I need approximately 1118 GB of RAM to assemble. Would it be advised to perform the error correction only step separately on each run and then try to co-assemble the output of both of those on assembler only? Is that possible? Do you have any ideas beyond normalizing the data? Thank you, for your time.

yqy6611 commented 2 months ago

One approach is using longer k-mer. I found that the default k-mer set for metagenomics is not enough. Longer k-mer will reduce RAM consumption during tandem-repeat resolution. If more SSD is available, personally I use 21,33,55,77 or 21,33,55,77,99,127