bioforensics / yeat

YEAT: Your Everyday Assembly Tool
Other
1 stars 0 forks source link

Add `-m` to seqtk command when low on RAM #63

Open danejo3 opened 8 months ago

danejo3 commented 8 months ago

We had a case where we were trying to downsample 11 billion reads with 1 TB of RAM and seqtk said it needed more RAM.

Below, a suggestion was made to resolve the issue; however, it will run slower.

Usage:   seqtk sample [-2] [-s seed=11] <in.fa> <frac>|<number>

Options: -s INT       RNG seed [11]
         -2           2-pass mode: twice as slow but with much reduced memory
standage commented 8 months ago

It may be easiest to just do the two-pass mode as a matter of course: 2x runtime shouldn't be too bad when x is small, and it looks like it's necessary when x is big. We could try to come up with some kind of threshold (as measured e.g. by file size) that separates the cases best suited for single pass versus the cases where we need two passes to reduce memory, but I'm skeptical about the cost/benefit of that approach vs the simpler approach.