Noble-Lab / casanovo

De Novo Mass Spectrometry Peptide Sequencing with a Transformer Model
https://casanovo.readthedocs.io
Apache License 2.0
90 stars 31 forks source link

Don't crash when multiple beams have identical peptide scores #306

Closed bittremieux closed 4 months ago

bittremieux commented 4 months ago

Fixes #271.

The problem was that if there are different beams, with different predicted amino acid sequences (i.e. tokens at this phase in the code), but an identical peptide score, beam caching fails. This likely occurs when there are multiple predictions that can't be distinguished, and thus only occurs when using multiple beams for ambiguous spectra.

The exact failure was because the information that is cached for each beam is a tuple of (peptide score, array of amino acid scores, array of amino acid tokens). When comparing tuples, first the first element is used, in case of ties the second element is used, and so on. In this case, the second element is an array, which doesn't have an obvious "truthy" value, leading to the observed error. However, we actually don't even want to compare those arrays, but compare beams based on the peptide score only.

This is now addressed by adding a completely random float as the second element in those tuples, before the array of amino acid scores. It's vanishingly unlikely that those random numbers would ever be equal, so in effect this arbitrarily breaks ties in case of equal peptide scores.

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 89.77%. Comparing base (cd29e4b) to head (666132d).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #306 +/- ## ======================================= Coverage 89.77% 89.77% ======================================= Files 12 12 Lines 929 929 ======================================= Hits 834 834 Misses 95 95 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.