Closed rsuchecki closed 5 years ago
Dear Rad,
I'd recommend using a ramdisk indeed, or a SSD at least, and we also find that some parallel network filesystems can be fast too (see by yourself the read/write speed perhaps using dd
or hdparm
).
Second recommendation is to look at the k-mer histogram (given by any kmer counter) to find out your optimal cutoff threshold (-min-abundance
) to get rid of most erroneous kmers, since more distinct kmers means longer unitigs generation time.
The max-memory
parameter actually has less influence on disk usage than one would think. As far as I recall, it's almost exclusively influencing the number of passes in the kmer counting step.
Minia won't be faster than BCALM (nor vice-versa), since, as you guessed, Minia actually calls BCALM as a submodule.
Thanks for your interest,
Rayan
Thank you for clarifying all that and for all the great tools Rayan!
Rad
This is not an issue but just a question: what would be the optimal settings for fast generation of unitigs? I expect running on a SSD or even better a ram drive should help. What about any of the options? Increasing
-max-memory
to reduce disk use seems to be a no-brainer, what else could help?Given that
minia
usesbcalm
I guess it makes sense to use usebcalm
directly for this purpose or could there be any advantage in usingminia
for fast generation of unitigs?