biomedicalinformaticsgroup / Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.
http://biomedicalinformaticsgroup.github.io/Sargasso/
Other
8 stars 4 forks source link

Deal with insertions and deletions wrt reference in CIGAR string #58

Closed lweasel closed 6 years ago

lweasel commented 6 years ago

It looks like these aren't recorded as mismatches in the NM tag. However, a mapping with an insertion or a deletion should be regarded as worse than one without.

hxin commented 6 years ago
hxin commented 6 years ago

I implement a check within each filter so that if a hit contains any insertion/deletion, it will be discarded. However, the test results on the 'trap' data show that this change leads to an increase in the number of correct reads being rejected, as shown in the Figure.

This refer to commit 696e94987bed8fb274d5478e355d79ed35e46c9e

cigarinsertiondeletion-primaryhit cigarInsertionDeletion-primaryHit.txt

hxin commented 6 years ago

also fix #66