EddyRivasLab / hmmer

HMMER: biological sequence analysis using profile HMMs
http://hmmer.org
Other
307 stars 69 forks source link

Fix "Invalid alphabet type" error in nhmmer - iss#271 #274

Closed traviswheeler closed 2 years ago

traviswheeler commented 2 years ago

This addresses https://github.com/EddyRivasLab/hmmer/issues/271.

In iss#271, running the command

% nhmmer --cpu 0 --dna query.fasta target.fasta

produces the following error:

Parse failed (sequence file assembly.fasta):
Line 2: unexpected char A; expected FASTA to start with >

The error arises when two things are true of the first sequence in the target file: (1) It does not contain all four nucleotide characters (e.g. it contains only A,C, and Ts, and has no Gs) (2) It is longer than the buffer length used when reading a sequence in nhmmer (4000)

What's happening is: (1) nhmmer calls esl_sqfile_GuessAlphabet() for the target sequence, even if the --dna (or --rna) flag is set (around line 704 of nhmmer.c).

So overall, the commit (i) obeys the --dna/--rna flag, so that it only bothers to call esl_sqfile_GuessAlphabet() on the target if it hasn't been told. (ii) checks the status returned by esl_sqfile_GuessAlphabet(), and reacts appropriately.

ptrebert commented 2 years ago

@traviswheeler can you estimate when this will be merged? Sorry for being annoying about this... :-)

traviswheeler commented 2 years ago

@ptrebert: it's out of my hands now, and will depend on @npcarter or @cryptogenomicon (or someone else in the group?) finding time to validate the PR. In the meantime, you can clone the nhmmer-invalid-alph branch from my repo, and use it until the PR has been merged.

ptrebert commented 2 years ago

ok, thanks for the info