EddyRivasLab / hmmer

HMMER: biological sequence analysis using profile HMMs
http://hmmer.org
Other
305 stars 69 forks source link

Possible use-of-uninitialized-value found in nhmmer using MSAN #320

Closed Augustin-Zidek closed 7 months ago

Augustin-Zidek commented 8 months ago

When we run a standard nhmmer query under MSAN, we get the following error:

WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x561b13529525 in esl_abc_FExpectScore [easel/esl_alphabet.c:1488]
    #1 0x561b13529af0 in esl_abc_FExpectScVec [easel/esl_alphabet.c:1586]
    #2 0x561b134e7c77 in p7_oprofile_GetFwdEmissionArray [hmmer/src/impl_sse/p7_oprofile.c:1375]
    #3 0x561b1348f849 in p7_Pipeline_LongTarget [hmmer/src/p7_pipeline.c:1565]
    #4 0x561b1345886d in pipeline_thread [hmmer/src/nhmmer.c:1609]
    #5 0x7f2d7fa1a7d8 in start_thread

SUMMARY: MemorySanitizer: use-of-uninitialized-value [easel/esl_alphabet.c:1488] in esl_abc_FExpectScore

A fix that works for us is adding the memset line below in src/p7_pipeline.c:1528:

 ESL_ALLOC(pli_tmp->fwd_emissions_arr, sizeof(float) *  om->abc->Kp * (om->M+1));
+memset(pli_tmp->fwd_emissions_arr, 0, sizeof(float) *  om->abc->Kp * (om->M+1));

 msv_windowlist.windows = NULL;
 vit_windowlist.windows = NULL;
 p7_hmmwindow_init(&msv_windowlist);

It is possible this is a false positive on our side, but I thought it was worth reporting in case it is not.

Performance doesn't seem to be impacted negatively by this change, but I did only lightweight testing so far.

cryptogenomicon commented 8 months ago

I'm not seeing this one. I've tried:

    mkdir build-linux-msan
    cd build-linux-msan
    ../configure CC=clang CFLAGS="-g -Wall -fsanitize=memory -fsanitize-memory-track-origins -fPIE -pie"
    make
    cd src
    ./nhmmer ../../tutorial/MADE1.hmm ../../tutorial/dna_target.fa 

with various combinations of clang 14.0.0 and 14.0.6, our 3.4 release code and our current develop branch, on a Rocky Linux 8.4 cluster node and a Ubuntu 22.04 VM on my laptop.

(I do see a different problem that the clang memory sanitizer detects though, specific to nhmmer --max or nhmmer --nobias; our make check fails on the same issue four times. I'll commit a fix for that issue soon.)

Can you give me more information so I can try to reproduce here? For example, do you see the issue when you use the nhmmer MADE1.hmm dnatarget.fa example in the tutorial files?

The suggested fix does look fine, so I may just put it in prophylactically, even if I end up not being able to reproduce this here.

Augustin-Zidek commented 7 months ago

It is a bit tricky to provide exact steps to reproduce since we build HMMER ourselves using an internal version of Clang.

However, I hope these will help:

I also ran with -fsanitize-memory-track-origins which gave me additional details confirming the location of the fix:

Uninitialized value was created by a heap allocation
    #0 0x555ee99e6b32 in malloc msan/msan_interceptors.cpp:1021
    #1 0x555ee9abdd52 in p7_Pipeline_LongTarget hmmer/src/p7_pipeline.c:1528
    #2 0x555ee9a7128d in pipeline_thread hmmer/src/nhmmer.c:1609

Let me know if there is any additional information that would help you debug this.

My suspicion is we are seeing this because of the more recent Clang version. Could you try building using Clang 17?

cryptogenomicon commented 7 months ago

Thanks. Clang 17 did the trick, and I was able to see it. I used your suggested fix and just made the commit to our develop branch.

Augustin-Zidek commented 7 months ago

Thanks a lot!