esl_sqfile_GuessAlphabet is supposed to return eslNOALPHABET when it cannot guess the alphabet of a sequence file. On files containing only empty sequences, this would be the expected return code. However when writing unit tests for pyhmmer I noticed that it would unexpectedly return eslEOD instead.
Turns out sqascii_GuessAlphabet calls sqascii_ReadWindow, which can return eslEOD when it reaches the end of a sequence, but that case was not handled properly:
status = sqascii_ReadWindow(sqfp, 0, 4000, sq);
if ((status == eslEOF)) { status = eslENODATA; goto ERROR; }
else if (status != eslOK) goto ERROR;
This PR adds a unit test to make sure eslNOALPHABET is returned on files with empty sequences, and replaces the code above with:
status = sqascii_ReadWindow(sqfp, 0, 4000, sq);
if ((status == eslEOF)) { status = eslENODATA; goto ERROR; }
else if ((status != eslOK) && (status != eslEOD)) goto ERROR;
to make sure that eslEOD is not considered an error here.
Hi!
esl_sqfile_GuessAlphabet
is supposed to returneslNOALPHABET
when it cannot guess the alphabet of a sequence file. On files containing only empty sequences, this would be the expected return code. However when writing unit tests forpyhmmer
I noticed that it would unexpectedly returneslEOD
instead.Turns out
sqascii_GuessAlphabet
callssqascii_ReadWindow
, which can returneslEOD
when it reaches the end of a sequence, but that case was not handled properly:This PR adds a unit test to make sure
eslNOALPHABET
is returned on files with empty sequences, and replaces the code above with:to make sure that
eslEOD
is not considered an error here.