FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
386 stars 101 forks source link

Incorrect values in XA:Z and XB:Z tags with --non_bs_mm option? #262

Closed lweasel closed 5 years ago

lweasel commented 5 years ago

Hi - thank you for the continued development of Bismark!

We think that there may be a bug in the function paired_end_SAM_output() (https://github.com/FelixKrueger/Bismark/blob/c7521709564ad3e9c8dc1ad8fbe5c94a9929f83e/bismark#L8173). Starting at line 8614 (https://github.com/FelixKrueger/Bismark/blob/c7521709564ad3e9c8dc1ad8fbe5c94a9929f83e/bismark#L8614) the code is adjusting the Bowtie2 alignment score to calculate the number of non-bisulfite mismatches in the case that the --non_bs_mm option has been specified.

However it appears that these lines are within the code block that is checking for the presence of a deletion or insertion in the second cigar string (i.e. the code block that starts on line 8573, https://github.com/FelixKrueger/Bismark/blob/c7521709564ad3e9c8dc1ad8fbe5c94a9929f83e/bismark#L8573), rather than after it. Thus it appears that the division of the alignment scores by 6 in order to calculate the number of mismatches only occurs if there is a deletion or insertion in the second cigar string?

We noticed this with a read containing a single non-bisulfite mismatch to the human genome which reported six mismatches in the XA:Z tag.

FelixKrueger commented 5 years ago

Dear Owen,

Many thanks for spotting this, you are absolutely right about the scoping. Can you test out the new dev version and let me know if there are still any problems? Best, Felix

lweasel commented 5 years ago

Dear Felix,

Yes the new dev version works perfectly now - thanks!

FelixKrueger commented 5 years ago

Brilliant, thanks for the feedback!