EddyRivasLab / hmmer

HMMER: biological sequence analysis using profile HMMs
http://hmmer.org
Other
305 stars 69 forks source link

nhmmer - Error: Invalid alphabet type in target for nhmmer. Expect DNA or RNA. #294

Closed DJ-Champion closed 1 year ago

DJ-Champion commented 1 year ago

Hello,

I get the following error message when running the following nhmmer command on the attached files (the real files are larger, these are truncated versions, they have the same issue.).

nhmmer --dna --tformat fasta -T 40 --cpu 40 --noali orf34428349.fna targetSequence.fna

"Error: Invalid alphabet type in target for nhmmer. Expect DNA or RNA."

(file extension changed for upload)

Looking at the target file, it starts with a long stretch of 't's 'a's and 'g's, with no 'c's. (So likely a stretch of telomere sequences). This however is what seems to be causing the error, because if I paste in a single 'c' in the start of the sequence I do not get the error. In the full file there is not a 'c' until around 12,000 nucleotides in. The full targetSequence is from ncbi, accession GCF_018350175.1

It seems I can force the type (dna or rna) for the query with options, but not the target. My guess is there is a checker that scans the first n number of nucleotides in the target for all four nucleotides, but upon not finding any in that window, it is then throwing the error, even though it is in fact a valid DNA sequence.

Any way around this problem?

orf34428349.txt targetSequence.txt

DJ-Champion commented 1 year ago

I also built a db using makehmmerdb --dna first on the targetSequence and used that instead and still recieved the same error.

cryptogenomicon commented 1 year ago

This is related to a bug that we fixed in our develop branch for issue #271 a while back, but we haven't yet made a new release that includes the fix. I've verified that your bug is present in the current 3.3.2 release, and is fixed in our develop branch already.

The best workaround for now might be to add that C nucleotide manually, alas. Your guess about what's happening with the type checker is basically correct, and the type checker is not supposed to be running when you add the --dna flag for force the alphabet. Another workaround is to use our git develop branch (along with the Easel develop branch), though that's not as convenient as building from one of our release tarballs.

DJ-Champion commented 1 year ago

Thank you for taking time to check and respond. I will run the development branch.