Closed BioGavin closed 3 weeks ago
Hi @BioGavin
This is quite likely coming from HMMER not being able to determine the alphabet of your sequence file because it is too short, and since digital=True
requires an alphabet to succeed, the parser fails in digital
mode but not in text mode.
If you know your sequences are always protein sequences you can provide an alphabet yourself:
import pyhmmer.easel as esl
in_fasta_path = "test.fa"
alphabet = esl.Alphabet.amino()
sequences = esl.SequenceFile(in_fasta_path, digital=True, alphabet=alphabet)
for sequence in sequences:
print(f"Name: {sequence.name.decode('utf-8')}")
print(sequence.sequence)
Thank you for your response. This solution worked perfectly, and the code now runs successfully.
Hi, authors. I’m encountering an issue when trying to read a file using esl.SequenceFile with the digital=True parameter. Here is the code I’m using for test:
The test.fa file contains the following sequence in FASTA format:
When I set digital=True, I get the following error:
If I don't set digital, it can run successfully and the output is here:
Here is the version information of pyhmmer I used:
I understand that the
digital=True
parameter is intended to convert amino acid letters to numeric values in the range 0-19. I have carefully checked my input sequence to ensure there are no invalid amino acid letters; all characters in the sequence conform to the standard protein alphabet. Despite this, I am still encountering the ValueError: Could not determine alphabet of file error. This is quite puzzling, and I would appreciate any guidance or insight you could provide on this issue.Thank you for your help!