fulcrumgenomics / fgbio

Tools for working with genomic and high throughput sequencing data.
http://fulcrumgenomics.github.io/fgbio/
MIT License
309 stars 67 forks source link

SetMateInformation should add the MQ tag to any supplementary alignments #960

Open msto opened 7 months ago

msto commented 7 months ago

Currently, SetMateInformation does not add the MQ tag to supplementary alignments.

I'm not sure if this is intended behavior - the docs suggest that supplementary alignments would have this tag added.

Adds and/or fixes mate information on paired-end reads. Sets the MQ (mate mapping quality), 'MC' (mate cigar string), ensures all mate-related flag fields are set correctly, and that the mate reference and mate start position are correct.

Supplementary records are handled correctly (updated with their mate's non-supplemental attributes). Secondary alignments are passed through but are not updated.

GroupReadsByUmi requires that all alignments, including supplementary alignments, have the MQ tag set, so it would be helpful if SetMateInformation produced a compatible BAM 🙂

nh13 commented 7 months ago

We use the SetMateInfoIterator from htsjdk which routes to the setMateInformationOnSupplementalAlignment method

It looks like MQ is not set on the supplementary records, so we'd likely need to make a PR into htsjdk, wait for a release, then update fgbio's htsjdk dependency.