edawson / rkmh

Classify sequencing reads using MinHash.
MIT License
48 stars 4 forks source link

N-containing kmers are unhandled #2

Closed edawson closed 8 years ago

edawson commented 8 years ago

Kmers containing ambiguous / alternate base characters are currently kept. These should be removed to fall in line with Mash/sourmash.

edawson commented 8 years ago

This is fixed by 70359 and later commits. We just exclude any kmers with N / gaps / W / Y etc (anything not [A, C, T, G] in them.