biomedicalinformaticsgroup / Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.
http://biomedicalinformaticsgroup.github.io/Sargasso/
Other
8 stars 4 forks source link

bismark return different number of mismatch for reads in the same pairs #103

Closed hxin closed 5 years ago

hxin commented 5 years ago

For the same pair of reads, bismark will return the number of mismatch for each read. This is different compared to STAR where the two reads in a pairs have the same number of mismatches.

The code is correctly using the first reads from a pair to get the number of mismathces, this works for STAR but not for bismark.

We cannot use the sum of the mismatches either because sometimes the two reads have overlapped region.

Thus, we decide to use the avgerage number of mismatches as the mismatch for the pair.

29d88874a1ddf06356e2b251f84eb05efde742c5

hxin commented 5 years ago

We run tests to see how this affect the assignment rates. image

image