broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
975 stars 369 forks source link

Logic of "If statements" for mismatch count is wrong #772

Closed renkyp closed 7 years ago

renkyp commented 7 years ago

Bug Report

Affected tool(s)

CollectAlignmentSummaryMetrics

Affected version(s)

picard-2.9.0-9-gb9978f5-SNAPSHOT-all

Description

The logic of

if (mismatch && isBisulfiteSequenced && record.getReadNegativeStrandFlag() && (refBases[refIndex + i] == 'G' || refBases[refIndex + i] == 'g') && (readBases[readBaseIndex] == 'A' || readBases[readBaseIndex] == 'a') || ((!record.getReadNegativeStrandFlag()) && (refBases[refIndex + i] == 'C' || refBases[refIndex + i] == 'c') && (readBases[readBaseIndex] == 'T') || readBases[readBaseIndex] == 't')) { bisulfiteBase = true; mismatch = false; }

is different from the original IF statement before [squid:S1066] merge:

if (mismatch && isBisulfiteSequenced) { ...}

where the condition (mismatch && isBisulfiteSequenced) should be firstly guaranteed.

Steps to reproduce

Expected behavior

Number of mismatch bases increase

Actual behavior

Actual number of mismatch bases should be less

yfarjoun commented 7 years ago

This is quite serious...the logic was wrong in that it used the bisulfite mismatch (c>t) even for non-bisulfite sequencing...so basically, c>t (on reverse reads) was not considered a mismatch for the purposes of ASM...

yfarjoun commented 7 years ago

Thanks again @renkyp. should be fixed on master now, and next release.