acg-team / tral

Tandem Repeat Annotation Library
http://acg-team.github.io/tral/
GNU General Public License v2.0
24 stars 7 forks source link

hmm.HMM.create() fails for DNA models #8

Open andrewparkermorgan opened 7 years ago

andrewparkermorgan commented 7 years ago

Hi,

Reading of models built from DNA sequences appears to fail: below I show an example using a motif from the Dfam database (DF0000029).

cp = hmm.HMM.create(input_format = "hmmer", file = "DF0000029.hmm")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm.py", line 261, in create
    hmmer_probabilities = next(HMM.read(file))
  File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm_io.py", line 187, in read
    for i in string_emissions]
  File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm_io.py", line 187, in <listcomp>
    for i in string_emissions]
ValueError: could not convert string to float: 'c'

It seems that too many columns are read from MATCH lines in the model: currently the number of columns is hard-coded as 20 (ie. amino acids) in hmm_io.py @ line 104.

Is there any support for DNA sequences in tral?

elkeschaper commented 7 years ago

Dear Andrew,

DNA sequence support was added in dev, but not the the main pip package. So you could install TRAL from git: git@github.com:elkeschaper/tral.git

(see: http://stackoverflow.com/questions/20101834/pip-install-from-github-repo-branch)

Please let me know if any problems remain!

Thanks,

Elke

On Dec 12, 2016, at 5:57 PM, andrewparkermorgan notifications@github.com wrote:

Hi,

Reading of models built from DNA sequences appears to fail: below I show an example using a motif from the Dfam database (DF0000029 http://dfam.org/entry/DF0000029).

cp = hmm.HMM.create(input_format = "hmmer", file = "DF0000029.hmm") Traceback (most recent call last): File "", line 1, in File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm.py", line 261, in create hmmer_probabilities = next(HMM.read(file)) File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm_io.py", line 187, in read for i in string_emissions] File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm_io.py", line 187, in for i in string_emissions] ValueError: could not convert string to float: 'c' It seems that too many columns are read from MATCH lines in the model: currently the number of columns is hard-coded as 20 (ie. amino acids) in hmm_io.py @ line 104.

Is there any support for DNA sequences in tral?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/elkeschaper/tral/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AFs9I1eECU--KsppWoBe5aX8ZyoHeZIqks5rHX0CgaJpZM4LKxoy.