Reference copy can overwrite ambiguous bases in some cases:
There is a reference sequence
Query A contains the sequences: TTTTATCTTGATTTTCT
Query B contains the sequences:
TTTTATCTTGATTTTCT
TTTATCTTAATTTTCTT
Hence Query A contains the split k-mer: AAGAAAAT AAGATAAA C
And Query B contains the split k-mer: AAGAAAAT AAGATAAA Y
However, when mapped, a SNP is called distinguishing the sequences:
>Query B
TTTTAGTTTTATCTTAATTTTCTTA
>Query A
-------TTTATCTTGATTTTCTT
The expected Y is converted to A because A is in the reference and matches the split k-mer arm.
Proposed fix here: https://github.com/bacpop/ska.rust/compare/master...nickjcroucher:ska.rust:master
Reference copy can overwrite ambiguous bases in some cases: