fulcrumgenomics / fgbio

Tools for working with genomic and high throughput sequencing data.
http://fulcrumgenomics.github.io/fgbio/
MIT License
311 stars 67 forks source link

Add optional validation of kept read ratio to CorrectUmis #917

Closed mjhipp closed 1 year ago

mjhipp commented 1 year ago

This is a proposed addition of an optional minimum kept ratio input to CorrectUmis. If used, the value must be between 0 and 1, and will cause an error if the ratio of kept/total reads is below the provided minimum value. The error occurs after writing all output files and metrics.

The reasoning for this addition is to catch an instance of a user providing the wrong input UMI list, or a user performing CorrectUmis on a library preparation that did not use defined UMIs. In both of those cases, CorrectUmis would filter the majority of reads, leaving a very small BAM output file, and complete successfully. In the case of a long pipeline, this UMI incompatibility may not be caught until all other analyses are completed, or at all. With this change, setting a minimum kept ratio as low as 0.01 could catch the error in most cases.

mjhipp commented 1 year ago

@nh13 I added one more line to the error message

mjhipp commented 1 year ago

Failed checks are from 2 test suites being aborted based on env. One is log4j, other is Intel deflater related

nh13 commented 1 year ago

@mjhipp can you rebase onto main now that #918 is merged?

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 40.00% and project coverage change: -0.04 :warning:

Comparison is base (84a99c6) 95.65% compared to head (4c72da0) 95.61%.

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #917 +/- ## ========================================== - Coverage 95.65% 95.61% -0.04% ========================================== Files 126 126 Lines 7296 7301 +5 Branches 480 505 +25 ========================================== + Hits 6979 6981 +2 - Misses 317 320 +3 ``` | Flag | Coverage Δ | | |---|---|---| | unittests | `95.61% <40.00%> (-0.04%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fulcrumgenomics#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://app.codecov.io/gh/fulcrumgenomics/fgbio/pull/917?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fulcrumgenomics) | Coverage Δ | | |---|---|---| | [...in/scala/com/fulcrumgenomics/umi/CorrectUmis.scala](https://app.codecov.io/gh/fulcrumgenomics/fgbio/pull/917?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fulcrumgenomics#diff-c3JjL21haW4vc2NhbGEvY29tL2Z1bGNydW1nZW5vbWljcy91bWkvQ29ycmVjdFVtaXMuc2NhbGE=) | `95.12% <40.00%> (-3.58%)` | :arrow_down: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

nh13 commented 1 year ago

Thanks @mjhipp !