Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

segfault in edlibAlign() #74

Closed geeknik closed 7 years ago

geeknik commented 7 years ago

Compiled with afl-clang-fast++ and ASan.

./edlib-aligner -p test003 test003

==25430==ERROR: AddressSanitizer: SEGV on unknown address 0x60220000cea0 (pc 0x0000004c70e2 bp 0x7fffd239af10 sp 0x7fffd239ab80 T0)
    #0 0x4c70e1 in transformSequences(char const*, int, char const*, int, unsigned char**, unsigned char**) /root/edlib/edlib/src/edlib.cpp:1380:9
    #1 0x4c70e1 in edlibAlign /root/edlib/edlib/src/edlib.cpp:115
    #2 0x4c011e in main /root/edlib/apps/aligner/aligner.cpp:162:35
    #3 0x7fcff06cbb44 in __libc_start_main /build/glibc-qK83Be/glibc-2.19/csu/libc-start.c:287
    #4 0x4be9bc in _start (/root/edlib/build/bin/edlib-aligner+0x4be9bc)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /root/edlib/edlib/src/edlib.cpp:1380 transformSequences(char const*, int, char const*, int, unsigned char**, unsigned char**)

test003.zip

Martinsos commented 7 years ago

Thanks for reporting this @geeknik! That is a lot of issues, could you please merge them all into one issue? It will be easier to track, and it does seem that all errors come from the same function, transformSequences().

Also, could you please give me more information on how you exactly compiled the code? I have never worked with afl-clang-fast++ and ASan, so having exact commands that you used will help a lot in reproducing the problem.

geeknik commented 7 years ago

You can download AFL here. Then I compiled edlib like so:

CC=afl-clang-fast CXX=afl-clang-fast++ AFL_USE_ASAN=1 cmake ..

followed by

CC=afl-clang-fast CXX=afl-clang-fast++ AFL_USE_ASAN=1 make

And then I just used your test-data as a starting point:

AFL_PRELOAD=/root/afl-2.39b/libdislocator/libdislocator.so afl-fuzz -m none -i ~/edlib/test_data/E_coli_DH1/ -o out ./edlib-aligner -p @@ @@

Martinsos commented 7 years ago

Ok, awesome! I will check this out in the next few days - if you already have some ideas on what is causing the issue, you are welcome to write them down or even create a PR.

Actually the main piece of code is written in cpp, so I believe CXX is also needed.

Martinsos commented 7 years ago

I am guessing the problem is that query and target contain character codes that are not in the range of [0, 127]. I didn't do much to protect against such situation except for providing comments:

     * @param [in] query  First sequence. Character codes should be in range [0, 127].
     * @param [in] target  Second sequence. Character codes should be in range [0, 127].

I will investigate further later to see if this is the cause and if yes, how best to fix it.

Martinsos commented 7 years ago

@geeknik I pushed a new commit c1f04e8e11b232c0fc3baa462e0a579fd3bdad4d which fixes cause of the problems for transformSequences - now any chars can be input, not just those in range 0-127. I also fixed all of the compiler warnings.

I did not use E_coli_DH1 since it is too large for afl-fuzz, I used much simpler test cases from the aligner/ dir instead, to speed up the testing.

Please keep in mind that edlib-aligner is not the central piece of Edlib, edlib library is. I created edlib-aligner in order to test edlib and run it easily, but I did not put much effort in ensuring that it is bullet-proof regarding the input - I am happy to try and improve on that, but I may not be going that far to ensure every kind of possible input works.

Martinsos commented 7 years ago

Solved with c1f04e8