EddyRivasLab / hmmer

HMMER: biological sequence analysis using profile HMMs
http://hmmer.org
Other
317 stars 70 forks source link

NHMMER gives unhelpful error when passed .embl file #195

Closed npcarter closed 4 years ago

npcarter commented 4 years ago

In looking at Travis' pull request for nhmmer, I grabbed a DNA database from Dfam as Dfam.embl, and tried searching it with the following results (results are the same for the pull request code and the released HMMER 3.3)

../src/nhmmer MADE1.hmm ~/Desktop/Dfam.embl fm_general.c: Error reading meta data for FM index.

Passing a file whose format we definitely shouldn't accept gives the same error:

../src/nhmmer MADE1.hmm ~/Human_Warrior_Sword-Shield-2.png fm_general.c: Error reading meta data for FM index.

I'm mostly flagging this because the error message doesn't strike me as one that would help a novice user figure out what was going wrong. If I hadn't known I was taking a chance on being able to read the .embl file, I wouldn't have had much idea what to do to resolve the error.

traviswheeler commented 4 years ago

Agreed, that's an unhelpful error. I'll address it soon in a separate PR.

Notes: (1) This isn't related to the --qformat PR. (2) I suspect that the Dfam.embl file is not correctly formatted; that's causing the target-format pipeline to fail when attempting to open the file, so that it falls through to the FM-index format reader ... which is throwing this unhelpful error. (that's the same pathway that a .png will yield): % nhmmer --tformat embl MADE1.hmm Dfam.embl ... Parse failed (sequence file Dfam.embl): Line 1: failed to find ID line

npcarter commented 4 years ago

Yep, separate bug. Question for you: should nhmmer be able to read .embl files?

traviswheeler commented 4 years ago

I believe it should be able work with .embl files, though I doubt it's a common use case. I've tried nhmmer on another embl file I have on hand, and it worked without a problem. I see that esl-reformat fails to handle Dfam.embl, because the first line is a CC entry, not an ID entry. Fixing that (putting the ID line first) resolves esl-reformat's problem, but leaves nhmmer with an error ("Line 1: failed to find ID line")

I can't find a complete embl format description, so I'm not sure if esl-reformat is right to fail on Dfam.embl. In any event, I still need to address (a) nhmmer failing when esl-reformat works, and (b) the unhelpful error you've described here.

traviswheeler commented 4 years ago

A followup question on this - do you want me to add the fix to this in PR #194, or a separate PR? The fix will require some small plumbing work (e.g. replacing esl_fatal calls in the fm_general code with proper error handling), and seems like it diverges from the intent of the PR, but I'll add it there if you want it. Either way, the update will come next week.

npcarter commented 4 years ago

I think that these are separate enough issues that it's worth making them two pull requests.

traviswheeler commented 4 years ago

Will do

traviswheeler commented 4 years ago

I don't think this should have been closed yet, as the problem hasn't been resolved. Perhaps you meant to close #171, which was addressed by PR #194?

I have a fix for this, and plan to submit a PR tomorrow. (I've just been waiting for the other PRs to make their way through the system, to avoid getting them confused with each other)

cryptogenomicon commented 4 years ago

Oopsie, ok.