Zymo-Research / aladdin-shotgun

MIT License
0 stars 2 forks source link

Add khmer as one of the preprocessing steps for sourmash #14

Closed zxl124 closed 1 year ago

zxl124 commented 1 year ago

Sourmash guidlines recommends using khmer to trim and remove reads with low abundance kmers. Add that as one of the complexity filtering option.

zxl124 commented 1 year ago

Upon further investigation, khmer is fundamentally not a low complexity filter, which makes it different from the other tools in the low complexity filtering step such as bbduk and fastp. Only the sourmash guideline recommends it a step to trim and remove low abundance reads. Therefore, it makes more sense to add it as a step only for sourmash.

zxl124 commented 1 year ago

khmer should be a step that is optional, by default not turned on, according to the sourmash paper. It states when comparing to genomes, which is the use case here, turning on low abundance trimming will increase false negatives but gain accuracy on proportion.