This commit adds functionality for a new nucleotide masker. This implementation is mostly a rewrite of sdust masker by Heng Li, part of the minimap2 project, in C++ with multithreading support. The implementation differs from NCBI's masker in that it entirely ignores Ns. This difference can produce slight differences in the masking output but should not affect Kraken2's database building. The masker also provides the option for replacing masked characters e.g. this command from mask_low_complexity_regions.sh:
$MASKER -in $file -outfmt fasta | sed -e '/^>/!s/[a-z]/x/g => cat $file | $MASKER -r x
This commit adds functionality for a new nucleotide masker. This implementation is mostly a rewrite of
sdust
masker by Heng Li, part of theminimap2
project, in C++ with multithreading support. The implementation differs from NCBI's masker in that it entirely ignoresN
s. This difference can produce slight differences in the masking output but should not affect Kraken2's database building. The masker also provides the option for replacing masked characters e.g. this command frommask_low_complexity_regions.sh
:$MASKER -in $file -outfmt fasta | sed -e '/^>/!s/[a-z]/x/g
=>cat $file | $MASKER -r x