fulcrumgenomics / fgbio

Tools for working with genomic and high throughput sequencing data.
http://fulcrumgenomics.github.io/fgbio/
MIT License
311 stars 67 forks source link

Does CallMolecularConsensusReads handle overlapping fragments? #926

Open kcibul opened 1 year ago

kcibul commented 1 year ago

In the docs for CallMolecularConsensusReads it says:

Also, this tool calls each end of a pair independently, and does not jointly call bases that overlap within a pair

However later on in the usage there is a flag (true by default)

--consensus-call-overlapping-bases

What does this parameter enable (and how does that align with the statement in the docs?

nh13 commented 1 year ago

This feature was added here: https://github.com/fulcrumgenomics/fgbio/pull/805. The goal was to have a tool that would take individual raw read pairs and if the read and its mate overlap, modify (consensus call) the bases that overlap to agree (or mask) on each read pair independently. Fine-grained control is given in the OverlappingBasesConsensusCaller tool to decide strategies for when the read and mate agree/disagree in the template, while both CallMolecularConsensusReads and CallDuplexConsensusReads have a single opt-in option.

For the CallMolecularConsensusReads tool, the input raw read pairs are independently examined to see if each read and its mate overlap and then pre-process those read pairs. Those read pairs are then fed into the downstream (molecular or duplex) consensus calling step as before. So really it's a pre-processing step for convenience in the tool, rather than being part of the consensus calling step. Conceptually this is the same as piping the output of OverlappingBasesConsensusCaller into CallMolecularConsensusReads (without the --consensus-call-overlapping-bases)