fulcrumgenomics / fgbio

Tools for working with genomic and high throughput sequencing data.
http://fulcrumgenomics.github.io/fgbio/
MIT License
315 stars 68 forks source link

Consensus calling after BQSR #623

Closed kanika-arora closed 4 years ago

kanika-arora commented 4 years ago

Hello,

From what I understand, the very first step of fgbio's consensus calling algorithms is adjustment of input base qualities. The wiki page https://github.com/fulcrumgenomics/fgbio/wiki/Calling-Consensus-Reads says that "these adjustments should only be used if there is a reason to believe the input base qualities are systematically over-estimated, otherwise this step should be ignored." If I were to start with a BAM file that has undergone base quality score recalibration, then my understanding is that this adjustment step should be skipped. Is that correct? However, I don't see an option to skip that in CallDuplexConsensusReads.

I have to run BQSR on the BAM files anyway for another tool. So would you recommend I run fgbio consensus calling on the BAM file without BQSR? Or is there a way to skip input base quality adjustment in fgbio consensus calling steps?

Thank you! Kanika

fleharty commented 4 years ago

@kanika-arora I'm not a fgbio developer, but I have found that BQSR can be useful after consensus calling. My experience is that the consensus caller estimates quality scores that are too high.

BQSR brings the qualities down a fair amount, and in some cases I suspect by too much. The consensus calls tend to be very high quality, and often times there aren't enough observations of reference disagreements in a particular context to provide good statistics.

nh13 commented 4 years ago

@kanika-arora I agree with @fleharty that BQSR is a great tool to assess and improve qualities post-consensus building, but that we really need to have a lot of data to assess base quality accuracy due to the vastly reduced error rate.

I see the option to "shift" the base quality was removed way back 0.1.2. If you don't believe there's an issue with the input base qualities, I think you are fine, otherwise I would take a look at the --error-rate-post-umi=PhredScore option and try that out. But again, it's very hard to recommend a good value, as it's very assay dependent and requires a lot of data to measure accuracy.

kanika-arora commented 4 years ago

@nh13 @fleharty thank you for your suggestions. I realized there was a typo in my earlier question, and it wasn't very clear. What I meant to ask was, what happens if I provide a base quality score recalibrated BAM as input to CallDuplexConsensusReads? Will it interfere with the input base quality adjustments that is part of the consensus calling algorithm?

fleharty commented 4 years ago

@kanika-arora It should not interfere with the base quality adjustments during consensus calling.