Open tomkinsc opened 5 years ago
Two thoughts:
I've added bbmap on an old branch ( https://github.com/broadinstitute/viral-ngs/tree/is-1808081312-add-bbmap ); I'll update it and make a PR.
@dpark01, maybe we want to de-dup before metagenomics classification as well—or at least have it as an option. @notestaff: Great! I'll hold off on integrating clumpify until after your PR has been merged in.
In preliminary testing, clumpify is much faster than mvicuna, but does not seem to remove as many reads with the settings I tried (subs=5 to match mvicuna and passes=4). On a bam file that mvicuna spends ~4 minutes on, clumpify spends 14s. For the particular test file I used, the read counts went from 954858 to 707870 for clumpfiy (~25% dups removed) and 671022 for mvicuna (~30% dups removed)
see: https://www.biostars.org/p/225338/