althonos / pyhmmer

Cython bindings and Python interface to HMMER3.
https://pyhmmer.readthedocs.io
MIT License
128 stars 12 forks source link

PyHMMER 0.8.0 TMD should be 0 for last node #40

Closed jamaliki closed 1 year ago

jamaliki commented 1 year ago

Hi,

We use your excellent software in ModelAngelo! Ever since the upgrade to pyHMMER v0.8.0, we are getting the following error during HMMAlign:

ValueError: Invalid HMM: TMD should be 0 for last node

I have attached an example of the HMM files we write. Could you please advise how we should proceed so that it works with the new version? 0.hmm.txt

althonos commented 1 year ago

HI @jamaliki!

I think the error is related to #36: since hmmalign doesn't automatically verify whether the given HMM is valid, there was a chance for a segmentation fault on invalid input. To mitigate this, I added an extra step to check the HMM is valid before invoking the hmmalign pipeline, and it seems to be what you're getting here.

May I ask, how did you obtain the HMMs? The error is exactly what it says, i.e. that the transition probability from match state to deletion ($T_{M \to D}$) should be zero for the last node, otherwise your HMM may be "stuck" in a deletion loop and won't align properly (I think?).

jamaliki commented 1 year ago

I see, thank you very much. We are generating these HMM's by hand based on a neural network's output. I think I have fixed it in this commit, if you would like to take a look: commit.

Overall, I think this is closed. Thank you!

althonos commented 1 year ago

Yes, that should do it!

By the way, you can also create a HMM and fill the emission scores & transition probabilities from Python, and then call HMM.write to write the result, instead of manually formatting the HMMs in text format :wink:

jamaliki commented 1 year ago

Oh interesting ! Is there a tutorial on how to fill out the emission and transition probabilities?

althonos commented 1 year ago

Not exactly a tutorial, but the attributes are documented: you will want to create a HMM from scratch, and then fill the transition_probabilities, insert_emissions and match_emissions arrays.

I added some goodies to v0.8.1 that was released like 5 minutes ago to help with this:

For instance, you can use the following code to fill the transitions, I let you adapt the rest for the match and insert emissions -- the only thing to keep in mind is that you should put raw probabilities, not log-scaled ones, so:

from pyhmmer.easel import Alphabet
from pyhmmer.plan7 import HMM, Transitions

alphabet = Alphabet.amino()
hmm = HMM(alphabet, M=len(aa_log_probs), name=b"example")

for res_index in range(len(aa_log_probs)):
    mm = max(confidence[res_index] - delta, gamma)
    hmm.transitions_probabilities[res_index+1, Transitions.MM] = mm
    hmm.transitions_probabilities[res_index+1, Transitions.MD] = hmm.transitions_probabilities[res_index+1, Transitions.MI] = (1.0 - mm) / 2
    hmm.transitions_probabilities[res_index+1, Transitions.IM] = hmm.transitions_probabilities[res_index+1, Transitions.DM] = 1.0 - delta
    hmm.transitions_probabilities[res_index+1, Transitions.II] = hmm.transitions_probabilities[res_index+1, Transitions.DD] = delta

hmm.validate()

(the first row is for the entry probabilities, so you should start at row 1 for the remaining ones).