EddyRivasLab / hmmer

HMMER: biological sequence analysis using profile HMMs
http://hmmer.org
Other
305 stars 69 forks source link

hmmemit: whoops, HMM is bad! #284

Closed chasemc closed 1 year ago

chasemc commented 1 year ago

I'm encountering what seems like the same issue :

hmmer version: bioconda/linux-64::hmmer-3.3.2-h87f3376_2

$ hmmemit -h
# hmmemit :: sample sequence(s) from a profile HMM
# HMMER 3.3.2 (Nov 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute

Note: The HMM in the url can be downloaded but not redistributed

$ curl -s https://magarveylab.ca/Skinnider_etal/models/hmm/thiotemplated/cMT.hmm > cMT.hmm
$ hmmemit cMT.hmm
whoops, HMM is bad!
$ hmmconvert cMT.hmm > temp.hmm
$ hmmemit temp.hmm
whoops, HMM is bad!

Originally posted by @chasemc in https://github.com/EddyRivasLab/hmmer/issues/116#issuecomment-1233060568

cryptogenomicon commented 1 year ago

Can you give me more information about where this HMM file came from?

The error message is correct: the HMM file is bad, at least one of the transition probability distributions is slightly unnormalized.

The date stamp on the file indicates it was created in June 2005 (before HMMER3), and the command log in the file indicates that hmmcalibrate was run on it... which suggests that this is originally a HMMER2 save file that was converted to H3 at some point in its life.

cryptogenomicon commented 1 year ago

At first glance I'm not sure this is something we're likely to fix. My guess is that this file was originally created with H2 and converted to an early H3 format at some point before 2015. As described in iss #116, we found an issue in 2014 with how H2 files are converted, and fixed it in HMMER 3.1b2 (March 2015); this file was converted before that fix.

The best thing to do, if you still have the alignment, is build a fresh H3 profile HMM from the alignment. H3's parameterization is far superior to H2's anyway; you'll get not just a valid model, but a better one.

chasemc commented 1 year ago

Thanks for the quick reply. I have no other info on the source the model as it was published alongside a manuscript but the data subsumed into a company that won't provide any further info

Knowing that running hmmconvert on a file converted prior to HMMER 3.1b2 won't fix #116 is good enough and I don't think requires any fix on HMMER's end. Thanks again

cryptogenomicon commented 1 year ago

Because of the fix for #116 (which made H2->H3 conversion more robust), it should work to use hmmconvert to convert the file to H2 format, then again to convert it back to H3 format. Conversions between H2 and H3 formats are lossy, though, so this workaround makes me nervous.

   % hmmconvert -2 cMT.hmm > foo.hmm
   % hmmconvert foo.hmm > foo2.hmm
   % hmmemit foo2.hmm
>cMT-sample1
EADGSDKDSLYGDVYRRILAEAVTNALRAAVTSQCGHRNAPRSILEVGAGTGAATEAIVR
ASGASFRSHYCFTDISHKFLEDAQERFARKNYEALTARAMDISKDPAEQSFSNARVDIII
ALDVIHATTDLPRTLDEIRMAWLLAPGGDLLLVSELDRKNRLQDFIFGPADDWWRFLDLQ
IPEGPLLFASQWRSYLKHAGFEDASLILGDCEYESPWDQSYSLAERP
chasemc commented 1 year ago

For the application I'm applying it to I think it will be okay to that even if it lossy Thanks again