broadinstitute / Drop-seq

Java tools for analyzing Drop-seq data
MIT License
120 stars 34 forks source link

Fix for donor assignment negative infinity likelihoods. #477

Closed jamesnemesh closed 1 month ago

jamesnemesh commented 1 month ago

Fix for edge case bug where multiple reads are summarized as a single UMI with a single phred score. Due to the underlying error rating being quantized by the pred score, a very high error rate of 0.9 or higher is quantized to a phred score of 0. This is equivalent to an error rate of 1 when the phred score is unpacked to a probability, which causes errors with the downstream likelihoods.

To fix this, during summarization, if the phred score is less than 1 it is set to 1. This avoids issues in the likelihood calculations where having probabilities of 0 causes issues when logged (plus probabilities should always include some uncertainty, which is lost if a phred score is 0.)

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.65%. Comparing base (825b982) to head (655fa4c). Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...digitalallelecounts/SummarizeUMIBaseQualities.java 50.00% 1 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #477 +/- ## ============================================ - Coverage 80.65% 80.65% -0.01% Complexity 4690 4690 ============================================ Files 358 358 Lines 20030 20032 +2 Branches 3118 3119 +1 ============================================ Hits 16156 16156 - Misses 2608 2609 +1 - Partials 1266 1267 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.