EddyRivasLab / easel

Sequence analysis library used by Eddy/Rivas lab code
Other
46 stars 26 forks source link

Bug fix; adds boundary check for ascii->buf in skip_whitespace() #66

Closed nawrockie closed 1 year ago

nawrockie commented 1 year ago

The skip_whitespace() function in esl_sqio_ascii.c failed to check if ascii->bpos was a valid position prior to accessing it. In rare cases it is not (if ascii->bpos == ascii->nc), which causes an invalid read as reported by valgrind.

The fix in this PR is to add a check if ascii->bpos == ascii->nc and if so to call loadbuf(). skip_whitespace() already has this check within a while (isspace(c)) loop but did not have the check at the beginning of the function prior to defining c for the first time.

This bug was found in some searches of Rfamseq partitions in the Rfam search pipeline. In rare cases, an output .tblout file would be corrupted, including a sequence name that was invalid (did not exist in the input sequence file) and random strange characters. Downstream code in the rfam pipeline then exited in error when the invalid sequence name was processed. cmsearch didn't actually exit due to the bug though, and small tweaks to the command, such as changing the location of the sequence file, could prevent the corrupted output. In my experiences with similar bugs, unpredictable behavior like this sometimes occurs with invalid reads to memory. Even when no corrupt output is produced, valgrind still reports an invalid read.

The skip_whitespace() function is only called by sqascii_ReadBlock() if input argument long_target is TRUE, which is only called by nhmmer in hmmer and cmsearch in infernal. I haven't yet seen an example of the bug causing a valgrind error in nhmmer, and didn't try to construct an example.

I did create the smallest cmsearch example that I could that reproduces the valgrind error (files attached: bug.fa.txt bug.cm.txt), with this command:

valgrind infernal-1.1.4/src/cmsearch --toponly --cpu 1 bug.cm.txt bug.fa.txt

If infernal-1.1.4 has been compiled with

sh ./configure --enable-debugging; make;

To reproduce in the 'develop' branches:

  1. follow the instructions to clone infernal, hmmer and easel but checkout the develop branches, instead of the infernal-1.1.4 tags: https://github.com/EddyRivasLab/infernal/wiki#initial-checkout

  2. comment out the 4 lines in hmmer/Makefile.in that include 'profmark', because it currently doesn't build on the develop branch.

  3. build in infernal/ with sh ./configure --enable-debugging; make

  4. run the command: valgrind infernal/src/cmsearch --toponly --cpu 1 bug.cm.txt bug.fa.txt

I did not add anything to unit tests or exercises to exercise this bug because it's only visible in valgrind, and currently requires cmsearch.

nawrockie commented 1 year ago

@traviswheeler : tagging you to review this also, you weren't in the dropdown list of reviewers. Thanks.

traviswheeler commented 1 year ago

I have no concerns about this - thanks for bringing the change to my attention.

cryptogenomicon commented 1 year ago

Looks good. (Sorry to be slow.) Thanks, Eric!