Closed jamaliki closed 1 year ago
HI @jamaliki!
I think the error is related to #36: since hmmalign
doesn't automatically verify whether the given HMM is valid, there was a chance for a segmentation fault on invalid input. To mitigate this, I added an extra step to check the HMM is valid before invoking the hmmalign
pipeline, and it seems to be what you're getting here.
May I ask, how did you obtain the HMMs? The error is exactly what it says, i.e. that the transition probability from match state to deletion ($T_{M \to D}$) should be zero for the last node, otherwise your HMM may be "stuck" in a deletion loop and won't align properly (I think?).
I see, thank you very much. We are generating these HMM's by hand based on a neural network's output. I think I have fixed it in this commit, if you would like to take a look: commit.
Overall, I think this is closed. Thank you!
Yes, that should do it!
By the way, you can also create a HMM and fill the emission scores & transition probabilities from Python, and then call HMM.write
to write the result, instead of manually formatting the HMMs in text format :wink:
Oh interesting ! Is there a tutorial on how to fill out the emission and transition probabilities?
Not exactly a tutorial, but the attributes are documented: you will want to create a HMM from scratch, and then fill the transition_probabilities
, insert_emissions
and match_emissions
arrays.
I added some goodies to v0.8.1
that was released like 5 minutes ago to help with this:
pyhmmer.plan7.Transitions
class, which can be used to index the transitions probabilities matrix instead of raw indices, makes the code more legibleHMM.validate
method, which does the same validation as hmmalign
, you can call it when you're done filling the HMM, it will raise a ValueError
if the contents are wrong.For instance, you can use the following code to fill the transitions, I let you adapt the rest for the match and insert emissions -- the only thing to keep in mind is that you should put raw probabilities, not log-scaled ones, so:
from pyhmmer.easel import Alphabet
from pyhmmer.plan7 import HMM, Transitions
alphabet = Alphabet.amino()
hmm = HMM(alphabet, M=len(aa_log_probs), name=b"example")
for res_index in range(len(aa_log_probs)):
mm = max(confidence[res_index] - delta, gamma)
hmm.transitions_probabilities[res_index+1, Transitions.MM] = mm
hmm.transitions_probabilities[res_index+1, Transitions.MD] = hmm.transitions_probabilities[res_index+1, Transitions.MI] = (1.0 - mm) / 2
hmm.transitions_probabilities[res_index+1, Transitions.IM] = hmm.transitions_probabilities[res_index+1, Transitions.DM] = 1.0 - delta
hmm.transitions_probabilities[res_index+1, Transitions.II] = hmm.transitions_probabilities[res_index+1, Transitions.DD] = delta
hmm.validate()
(the first row is for the entry probabilities, so you should start at row 1 for the remaining ones).
Hi,
We use your excellent software in ModelAngelo! Ever since the upgrade to
pyHMMER v0.8.0
, we are getting the following error duringHMMAlign
:I have attached an example of the HMM files we write. Could you please advise how we should proceed so that it works with the new version? 0.hmm.txt