Closed phiweger closed 7 years ago
Same in spirit, but there are a few differences --
syrah splits reads on N/errors, trim-low-abund truncates - so drops more data. This is fixable!
syrah is much less configurable, and so doesn't work for as many situations yet (while trim-low-abund is super flexible and therefore also very confusing).
trim-low-abund permits both streaming and semi-streaming (and the latter uses some amount of disk space for large/low-coverage data sets). syrah is pure streaming and uses no disk space.
syrah was built for this project,
ivory.idyll.org/blog/2017-sourmash-sra-microbial-wgs.html
and it's not clear how general it is. t-l-a is my current recommendation.
very clear explanation, thanks
A recommendation in https://github.com/dib-lab/sourmash/issues/283 was to trim k-mers to avoid the collection of sequencing errors. How are the above two approaches related. From a quick glance at the code, there seems to be some overlap. Are they basically doing the same?
Thx!