claczny / VizBin

Repository of our application for human-augmented binning
27 stars 14 forks source link

Feature request: sequencing masking #49

Open a-h-b opened 4 years ago

a-h-b commented 4 years ago

I'd like Vizbin to recognize masked sequences, i.e. ignore small letters. This would be useful to ignore e.g. 16S regions or other regions that obscure kmer profiles.

Usually, the user would supply the already masked sequence, but if you're mega cool, you could include a module that recognizes highly conserved/structural regions and does the masking internally.

claczny commented 4 years ago

Thx for the suggestion.

A fictious example (real sequences would have to be longer of course):

>seq1
AATTCGATTAGaaaaaaaaaaaaaTGCCAGtctctctc
>seq2
tttttttttACGCGATAGATAGCAATTCCGGTTT

In this example, for seq1, aaaaaaaaaaaaaand tctctctc would have to be ignored and k-mers would only be computed for AATTCGATTAGTGCCAG. For seq2, ttttttttt would have to be ignored and k-mers would only be computed for ACGCGATAGATAGCAATTCCGGTTT.