fulcrumgenomics / fgbio

Tools for working with genomic and high throughput sequencing data.
http://fulcrumgenomics.github.io/fgbio/
MIT License
309 stars 67 forks source link

GroupReadsByUmi may fail when marking duplicates including secondary/supplementary reads #964

Open nh13 opened 7 months ago

nh13 commented 7 months ago

There's an open issue in hts-specs about how we want to handle getting the primary alignment information when looking at a secondary or supplementary read: https://github.com/samtools/hts-specs/issues/755

This PR adds the read primary "rp" tag to store the primary alignment for end of the current secondary/supplementary alignment, in the same format as the "SA" tag. The mate's primary alignment is stored in the "mp" tag. Both are currently lowercase as they are not reserved tags.

I have tested that ZipperBams will now add these, that SortBam will correctly sort in template-coordinate, and finally that GroupReadsByUmi passes. I added tests for GroupReadsByUmi and SamOrder.

Also, in my hands, secondary and supplementary records will never be output by GroupReadsByUmi as currently only primary alignments are output.

codecov[bot] commented 7 months ago

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (371db03) 95.62% compared to head (1b5753c) 95.64%.

Files Patch % Lines
src/main/scala/com/fulcrumgenomics/bam/Bams.scala 90.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #964 +/- ## ========================================== + Coverage 95.62% 95.64% +0.01% ========================================== Files 126 126 Lines 7360 7392 +32 Branches 495 531 +36 ========================================== + Hits 7038 7070 +32 Misses 322 322 ``` | [Flag](https://app.codecov.io/gh/fulcrumgenomics/fgbio/pull/964/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fulcrumgenomics) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/fulcrumgenomics/fgbio/pull/964/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fulcrumgenomics) | `95.64% <95.91%> (+0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fulcrumgenomics#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.