Closed bovee closed 2 years ago
(referencing back to https://github.com/openjournals/joss-reviews/issues/2991 so it shows up over there)
@bovee thanks a ton, yes the length filter seems to be the default in filtlong. i thought it might be clearer to separate those filters in two individual steps, but can see how they should probably be implemented in one.
i will work on this tonight on a new branch, appreciate the review over the weekend :)
@bovee thanks so much for your patience with this, it has been a bit of a wild year.
I have removed the more complex (two-pass) filters to keep in line with the design philosophy for speed and minimal quality controls. Not sure about your experience, but have never really used the interesting filtlong
filters, and it seems to me they are more geared for research, rather than an implementation for speed and stability in production.
Rewrote the code base (it was a bit of a mess before) + added better tests, documentation, benchmarks and continuous integration. It seems like needletail
is around twice as fast as rust-bio-tools sequence-stats
in fast mode which ignores the quality scores :tada: Also confirmed output between all programs in the benchmarks --> #17 and #18
Closing this for now addressed in new version and latest paper iteration.
I was comparing nanoq to filtlong and the timing/performance seem comparable to what you've documented, but I see some differences in the output files I get.
For example, I pulled the first four million reads of the Zymo even data and ran it through both filtlong and nanoq with
-p 80 -b 500000000
settings and got somewhat different sets of output reads. Comparing the two, I see 42994 reads were only in filtlong's output, 20795 were shared, and 91994 were in nanoq's output only. This might be because filtlong imposes a 5 kbp minimum contig length filter and nanoq doesn't, but I'm not sure how to set the same length prefilter in nanoq to compare the results.I think either the filtering algorithm should be explained a bit more beyond "extended two-pass filtering analogous to Filtlong", I should be able to set parameter combinations that get results closer to filtlong, or both?