Open a-h-b opened 4 years ago
Thx for the suggestion.
A fictious example (real sequences would have to be longer of course):
>seq1
AATTCGATTAGaaaaaaaaaaaaaTGCCAGtctctctc
>seq2
tttttttttACGCGATAGATAGCAATTCCGGTTT
In this example, for seq1
, aaaaaaaaaaaaa
and tctctctc
would have to be ignored and k-mers would only be computed for AATTCGATTAGTGCCAG
.
For seq2
, ttttttttt
would have to be ignored and k-mers would only be computed for ACGCGATAGATAGCAATTCCGGTTT
.
I'd like Vizbin to recognize masked sequences, i.e. ignore small letters. This would be useful to ignore e.g. 16S regions or other regions that obscure kmer profiles.
Usually, the user would supply the already masked sequence, but if you're mega cool, you could include a module that recognizes highly conserved/structural regions and does the masking internally.