EddyRivasLab / hmmer

HMMER: biological sequence analysis using profile HMMs
http://hmmer.org
Other
317 stars 70 forks source link

Fix for Isssue #198 #199

Closed traviswheeler closed 4 years ago

traviswheeler commented 4 years ago

The change in this PR is simple: in rescore_isolated_domain(), if p7_Decoding() returns eslERANGE*, then the bg model is returned to its original form via a call to reparameterize_model() before rescore_isolated_domain() returns an eslFAIL status. Previously, it just returned the eslFAIL status without recovering bg, so that all subsequent work was done with a wrong (and highly biased) bg.

I've tested this on thousands of highly-repetitive models and sequences found in the raw data for a new Dfam release, and it works as expected: the rare eslERANGE result is handled gracefully (bg model is reset as it should be, subsequent work proceeds as normal).

I've also tested on a few hundred repetitive models/sequences against the unmasked human genome (i.e. tandem repeat regions not masked). In these runs, I have not seen any instances of the eslERANGE result from p7_Decoding(). Considering the repetitiveness of the queries, I expect in-the-wild occurrence of this bug to be quite low.

cryptogenomicon commented 4 years ago

Terrific, thanks! Just in time for 3.3.1 release.