Closed dayuer2010 closed 5 years ago
Can you please copy-paste the error message you get? Thanks.
Similar to dayuer2010's comment, it does not respond well to 'Z', 'X', 'B', or '-' (gaps), though the website seems to handle these all well enough. I'm attempting to use the code as an import library for python3.6 (see lines below for the error-type), though errors for these characters appear when run through command-line as well. Otherwise, without these characters, it runs and produces the outputs one would expect, but the RuntimeWarning: divide by zero.. etc. appears just the same. I hope this helps shed some light on things. Thanks very much for making it available.
In [18]: annotation, posterior = tmhmm.predict('MREXNNQSSTLEFILLGVTGQQEQEDFFYILFLFIYPITLIGNLLIVLAICSDVRLHNPMYFLLANLSLVDIFFSSVTIPKMLANHLLGSKSISFGGCLTQMYFMIALGNTDSYILAAMAYDRAVAISRPLHYTTIMSPRSCIWLIAGSWVIGNANALPHTLLTASLSFCGNQEVANFYCDITPLLKLSCSDIHFHVKMMYLGVGIFSVPLLCIIVSYIRVFSTVFQVPSTKGVLKAFSTCGSHLTVVSLYYGTVMGTYFRPLTNYSLKDAVITVMYTAVTPMLNPFIYSLRNRDMKAALRKLFNKRISS', '/foo/foo/foo/bar/TMHMM2.0.model')
/foo/foo/anaconda3/lib/python3.6/site-packages/tmhmm/__init__.py:21: RuntimeWarning: divide by zero encountered in log
_, path = viterbi(sequence, *model)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-18-e2aa6964c763> in <module>()
----> 1 annotation, posterior = tmhmm.predict('MREXNNQSSTLEFILLGVTGQQEQEDFFYILFLFIYPITLIGNLLIVLAICSDVRLHNPMYFLLANLSLVDIFFSSVTIPKMLANHLLGSKSISFGGCLTQMYFMIALGNTDSYILAAMAYDRAVAISRPLHYTTIMSPRSCIWLIAGSWVIGNANALPHTLLTASLSFCGNQEVANFYCDITPLLKLSCSDIHFHVKMMYLGVGIFSVPLLCIIVSYIRVFSTVFQVPSTKGVLKAFSTCGSHLTVVSLYYGTVMGTYFRPLTNYSLKDAVITVMYTAVTPMLNPFIYSLRNRDMKAALRKLFNKRISS', '/foo/foo/foo/bar/TMHMM2.0.model')
~/anaconda3/lib/python3.6/site-packages/tmhmm/__init__.py in predict(sequence, model_or_filelike, compute_posterior)
19 _, model = parse(open(model_or_filelike))
20
---> 21 _, path = viterbi(sequence, *model)
22 if compute_posterior:
23 forward_table, constants = forward(sequence, *model)
tmhmm/hmm.pyx in tmhmm.hmm.viterbi()
KeyError: 'X'
Thanks, @DavidVillalta! I'll have a look at it this week.
@DavidVillalta, it seems that TMHMM handles this in a pretty weird way (at least I can't figure out how they get their results). I tested with the sequence XXXBBBUUU---ZZZ
.
TMHMM web server output:
# WEBSEQUENCE
# AA inside membr outside
1 X 0.52190 0.00000 0.4781
2 X 0.52190 0.00000 0.4781
3 X 0.52190 0.00000 0.4781
4 B 0.52190 0.00000 0.4781
5 B 0.52190 0.00000 0.4781
6 B 0.52190 0.00000 0.4781
7 X 0.52190 0.00000 0.4781
8 X 0.52190 0.00000 0.4781
9 X 0.52190 0.00000 0.4781
10 Z 0.52190 0.00000 0.4781
11 Z 0.52190 0.00000 0.4781
12 Z 0.52190 0.00000 0.4781
So it seems that they stripped the gaps (-), but kept everything else. However, it's very difficult to figure out how they handle this as it's not documented anywhere. Would you be happy with a solution where the ambiguous characters are just stripped?
I had imagined the ambiguous characters get assigned a score based on their their neighbors scores, plus the proximity to the end/beginning of a predicted TMH (perhaps by taking an average TMH-length, either specific to the protein or a generalized one), but indeed, they are all ambiguous and yet they get assigned a score, curious. After putting in a request http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?tmhmm it appears that the standalone version handled it too (except gaps), though I do not speak Perl, and cannot say how. Unfortunately, I will need to preserve at least two ambiguous variables and I work with Python primarily, so it was nice to find your adaptation to the language.
Could you e-mail me the Perl implementation? Maybe I can figure out how they handled it from the code, even though I'm not very familiar with Perl either.
Sorry for providing a broken-link, but I think I put it in correctly, now. For a download of the script, just fill out with the form with an academic e-mail address. The reply with a download link is automated and near instantaneous. I would e-mail it to you, but the license agreement is pretty explicit about sharing it outside of my "research site".
@dansondergaard I think I am going to be able to use this, as is, after-all. Thanks for the help. I do have one more question though, where is this line "compute_posterior=False" supposed to be inserted if I want to turn off these outputs? EDIT: Figured it out, thanks anyway.
Hi Dave, good to hear that it can be used anyway! It’s supposed to go in the call to predict:
annotation, posterior = tmhmm.predict(sequence, ‘mymodel.model’, compute_posterior=False)
Hi, dansondergaard,
Thank you for putting the effort creating this python package. Have you figured out the bug about U? As I am also encountered similar issue.
Thanks a lot!
@0yliu I don't have any plans to fix this at the moment since it's hard to figure out how TMHMM handles these cases. None of it is documented anywhere.
Dear dansondergaard, first congratulation you developed such a good software,I have installed it.if I used the test.fa,it will work successful ,But If I used a protein with a "U" in sequence,it doesn't work and produce a error. but if I put this protein with "U" in TMHMM website,it will work and produce a prediction result. can you help how to solve this problem?