bacpop / ska.rust

Split k-mer analysis – version 2
https://docs.rs/ska/latest/ska/
Apache License 2.0
56 stars 4 forks source link

Reference bias in map with repeats #28

Closed johnlees closed 1 year ago

johnlees commented 1 year ago

Proposed fix here: https://github.com/bacpop/ska.rust/compare/master...nickjcroucher:ska.rust:master

Reference copy can overwrite ambiguous bases in some cases:

There is a reference sequence Query A contains the sequences: TTTTATCTTGATTTTCT Query B contains the sequences:

TTTTATCTTGATTTTCT
TTTATCTTAATTTTCTT

Hence Query A contains the split k-mer: AAGAAAAT AAGATAAA C And Query B contains the split k-mer: AAGAAAAT AAGATAAA Y However, when mapped, a SNP is called distinguishing the sequences:

>Query B
TTTTAGTTTTATCTTAATTTTCTTA
>Query A
-------TTTATCTTGATTTTCTT

The expected Y is converted to A because A is in the reference and matches the split k-mer arm.