DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
686 stars 267 forks source link

Initial commit of dustmasker for kraken2 #662

Closed ch4rr0 closed 1 year ago

ch4rr0 commented 1 year ago

This commit adds functionality for a new nucleotide masker. This implementation is mostly a rewrite of sdust masker by Heng Li, part of the minimap2 project, in C++ with multithreading support. The implementation differs from NCBI's masker in that it entirely ignores Ns. This difference can produce slight differences in the masking output but should not affect Kraken2's database building. The masker also provides the option for replacing masked characters e.g. this command from mask_low_complexity_regions.sh:

$MASKER -in $file -outfmt fasta | sed -e '/^>/!s/[a-z]/x/g => cat $file | $MASKER -r x

BenLangmead commented 1 year ago

This is superseded by #675