KorfLab / SNAP

Gene prediction software
Other
60 stars 17 forks source link

Strange HMM resulting from SNAP training #12

Closed soungalo closed 2 years ago

soungalo commented 2 years ago

Hi there, I'm trying to train SNAP for gene prediction in soybean. My input is based on reference genes identified as BUSCOs. I followed the instructions in the documentation and was able to acquire a .hmm file (attached). However, it looks a bit strange, containing -nan values in certain places and missing some parameters I see in the pre-trained HMMs. If I try to use it for gene prediction I get a total mess, so I figure I must have done something wrong. Any idea what that could be? I should also mention that during the training I got quite a few error messages like that: MODEL195 1 1 11 + errors(1): gene:misordered_Eterm Not sure if and what I should have done about that. But when running: $ fathom genome.ann genome.dna -gene-stats I finally get:

20 sequences
0.346555 avg GC fraction (min=0.333551 max=0.353677)
5996 genes (plus=2777 minus=3219)
1156 (0.192795) single-exon
4840 (0.807205) multi-exon
240.876190 mean exon (min=1 max=6544)
550.052734 mean intron (min=11 max=16734)

Any thoughts? Thanks!

soy.hmm.txt

iankorf commented 2 years ago

Make all of your exons 'Exon' rather than 'Einit' and 'Eterm'. The software should figure things out from there.

soungalo commented 2 years ago

Thanks. This time I got no error messages, but the hmm file still contains -nan values. Predicting using this hmm still results in very strange gene models.

soy.hmm.txt

soungalo commented 2 years ago

Fixed by this commit.