hmm.HMM.create() fails for DNA models

Hi,

Reading of models built from DNA sequences appears to fail: below I show an example using a motif from the Dfam database (DF0000029).

cp = hmm.HMM.create(input_format = "hmmer", file = "DF0000029.hmm")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm.py", line 261, in create
    hmmer_probabilities = next(HMM.read(file))
  File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm_io.py", line 187, in read
    for i in string_emissions]
  File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm_io.py", line 187, in <listcomp>
    for i in string_emissions]
ValueError: could not convert string to float: 'c'

It seems that too many columns are read from MATCH lines in the model: currently the number of columns is hard-coded as 20 (ie. amino acids) in hmm_io.py @ line 104.

Is there any support for DNA sequences in tral?

Dear Andrew,

DNA sequence support was added in dev, but not the the main pip package. So you could install TRAL from git: git@github.com:elkeschaper/tral.git

(see: http://stackoverflow.com/questions/20101834/pip-install-from-github-repo-branch)

Please let me know if any problems remain!

Thanks,

Elke

On Dec 12, 2016, at 5:57 PM, andrewparkermorgan notifications@github.com wrote:

Hi,

Reading of models built from DNA sequences appears to fail: below I show an example using a motif from the Dfam database (DF0000029 http://dfam.org/entry/DF0000029).

cp = hmm.HMM.create(input_format = "hmmer", file = "DF0000029.hmm") Traceback (most recent call last): File "", line 1, in File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm.py", line 261, in create hmmer_probabilities = next(HMM.read(file)) File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm_io.py", line 187, in read for i in string_emissions] File "/Users/apm/anaconda3/lib/python3.4/site-packages/tral/hmm/hmm_io.py", line 187, in for i in string_emissions] ValueError: could not convert string to float: 'c' It seems that too many columns are read from MATCH lines in the model: currently the number of columns is hard-coded as 20 (ie. amino acids) in hmm_io.py @ line 104.

Is there any support for DNA sequences in tral?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/elkeschaper/tral/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AFs9I1eECU--KsppWoBe5aX8ZyoHeZIqks5rHX0CgaJpZM4LKxoy.

acg-team / tral

hmm.HMM.create() fails for DNA models #8