Dynamically-linked binary error on K-Scheduler and corresponding fix.

stwjt commented 1 year ago

Hi, I use k-scheduler to fuzz xmllint2.6.0. I build xmllint.elf successfully but fail to execute it. Specifically, I can use afl-clang-fast to build xmllint2.6.0 and use afl to fuzz it successfully. So I think maybe there’s something wrong with K-scheduler. Can you help me to figure out the problem? I use K-scheduler to fuzz other programs(bsdtar, nasm, etc) successfully. 127c103599399e333a92fc69f38627b libxml2-2.6.0.zip

Dongdongshe commented 1 year ago

I successfully built xmllint with K-Scheduler before. You need to add "--disabled-shared" flag when running "./configure" to build a static-linked binary. Because some open-source programs build a dynamically-linked binary by default, which would cause an unexpected error on K-Scheduler.

Let me know if you still encounter further issues. I am happy to help.

egoterm commented 1 year ago

I successfully built xmllint with K-Scheduler before. You need to add "--disabled-shared" flag when running "./configure" to build a static-linked binary. Because some open-source programs build a dynamically-linked binary by default, which would cause an unexpected error on K-Scheduler.

Let me know if you still encounter further issues. I am happy to help.

Hello, I also encountered a similar problem. I had the same problem trying to compile xmllint with k-scheduler. My compile option settings are as follows:

export LLVM_COMPILER=clang
export CC=wllvm 
export CXX=wllvm++ 
export CFLAGS="-fsanitize-coverage=trace-pc-guard,no-prune -O2 -fsanitize=address" 
export CXXFLAGS="-fsanitize-coverage=trace-pc-guard,no-prune -O2 -fsanitize=address" 
export LDFLAGS=/home/fuzz/K-Scheduler/afl_integration/afl-2.52b_kscheduler/llvm_mode/afl-llvm-rt.o 
./configure --disable-shared

However, I encountered an error when compiling libxml2.6.0:

nanohttp.c:966:12: error: use of undeclared identifier 'len'
        SOCKLEN_T len;

nanoftp.c:1552:15: error: use of undeclared identifier 'dataAddrLen'; did you mean 'dataAddr'?
    SOCKLEN_T dataAddrLen;

This error did not appear when I used afl's compilation toolchain afl-clang-fast and my own wllvm-based compilation toolchain to compile xmllint. This error seems to be that the definition of SOCKLEN_T was not found. I checked the relevant code and replaced SOCKLEN_T with the corresponding type (unsigned int), and finally compiled successfully. However, I encountered the same problem when using the seed that comes with afl to test the compiled xmllint:

I don't know the reason for this error, because I can successfully use k-scheduler to fuzz other programs such as readelf and tiffinfo, and I can also successfully use afl, fairfuzz and other fuzzers to test xmllint2.6.0. I would like to ask if the reason for this problem is caused by k-scheduler instrumentation or some compilation options?

PS: In order to verify whether turning on compile-time optimization will affect the error, I reset the compilation options again:

export LLVM_COMPILER=clang
export CC=wllvm 
export CXX=wllvm++ 
export CFLAGS="-fsanitize-coverage=trace-pc-guard,no-prune -fsanitize=address" 
export CXXFLAGS="-fsanitize-coverage=trace-pc-guard,no-prune -fsanitize=address" 
export LDFLAGS=/home/fuzz/K-Scheduler/afl_integration/afl-2.52b_kscheduler/llvm_mode/afl-llvm-rt.o 
./configure --disable-shared

I deleted -O2 and recompiled xmllint again, but the same error was still generated when running xmllint.libxml2-2.6.0.zip

libxml2-2.6.0.zip is the original libxml file, and libxml2-2.6.0_modified.zip is the file after replacing SOCKLEN_T according to the error report. Looking forward to your reply!

libxml2-2.6.0.zip libxml2-2.6.0_modified.zip

final binary: xmllint_elf.zip It is ok for this binary to run -h:

Dongdongshe commented 1 year ago

I compiled libxml2-2.6.0 and encountered the same error as you reported. The libxml2 I successfully built is the latest version libxml2-2.9.14. Libxml2-2.6.0 was released in 2003, around 20 years ago. If you don't have a particular reason, like you have to reproduce some ancient known bugs in this old version, I suggest switching to the latest version, Libxml2-2.9.X.

I am not sure why this weird compilation error occurred on Libxml2-2.6.0. But I can share some of my guesses with you and hope it will be helpful.

This issue is not related to K-Scheduler but is more likely to the potential incompatibility between 20 years old code with recent wllvm, clang, Asan and AFL. Because K-Scheduler does not rely on any customized instrumentation, it builds binary with unmodified and vanilla wllvm, clang, Asan and AFL forkserver wrapper. When you build Libxml2-2.6.0, you do not use any single line of K-Scheduler code except those vanilla tools (wllvm, clang, Asan and AFL). Debugging and fixing the incompatibility between ancient open-source programs and compiler toolchains may require much engineering effort. The fastest way to get around this issue is to try a recent version of open-source programs or stable compiler toolchains.

egoterm commented 1 year ago

I compiled libxml2-2.6.0 and encountered the same error as you reported. The libxml2 I successfully built is the latest version libxml2-2.9.14. Libxml2-2.6.0 was released in 2003, around 20 years ago. If you don't have a particular reason, like you have to reproduce some ancient known bugs in this old version, I suggest switching to the latest version, Libxml2-2.9.X.

I am not sure why this weird compilation error occurred on Libxml2-2.6.0. But I can share some of my guesses with you and hope it will be helpful.

This issue is not related to K-Scheduler but is more likely to the potential incompatibility between 20 years old code with recent wllvm, clang, Asan and AFL. Because K-Scheduler does not rely on any customized instrumentation, it builds binary with unmodified and vanilla wllvm, clang, Asan and AFL forkserver wrapper. When you build Libxml2-2.6.0, you do not use any single line of K-Scheduler code except those vanilla tools (wllvm, clang, Asan and AFL). Debugging and fixing the incompatibility between ancient open-source programs and compiler toolchains may require much engineering effort. The fastest way to get around this issue is to try a recent version of open-source programs or stable compiler toolchains.

OK, thanks. I will try xmllint2.9+.

egoterm commented 1 year ago

Hi, I have the last question. According to build_example I set the fuzzing cmd as follows:

./afl-fuzz_kscheduler -i /testcases/others/elf -o afl_out_cent -d -m none ./size @@

I wonder if K-Scheduler can adjust the algorithm parameters according to different options like MOPT? Can the current command take full advantage of K-Scheduler? If I have 100 cores, do I have to allocate 50 cores to run python scripts and 50 cores to run afl-fuzz? Because I tried to run 60 tasks on 100 cores, some tasks had very different edge coverage results than others. Thank you and look forward to your reply.

Dongdongshe commented 1 year ago

Hi, I have the last question. According to build_example I set the fuzzing cmd as follows:
./afl-fuzz_kscheduler -i /testcases/others/elf -o afl_out_cent -d -m none ./size @@
I wonder if K-Scheduler can adjust the algorithm parameters according to different options like MOPT? Can the current command take full advantage of K-Scheduler? If I have 100 cores, do I have to allocate 50 cores to run python scripts and 50 cores to run afl-fuzz? Because I tried to run 60 tasks on 100 cores, some tasks had very different edge coverage results than others. Thank you and look forward to your reply.

Are there any adjustable parameters in K-Scheduler like Mopt? No, the centrality scores used in K-Scheduler are already adaptive to different programs and different horizon graphs. But there do exist a few parameters in the centrality-based weight computations. Please look at our paper section 4.E for more details. And current implementation of the K-Scheduler is still a prototype. Feel free to hack it if you want.
"Could the current command take full advantage of K-Scheduler?" What do you mean by "take full advantage of K-Scheduler"? Could you elaborate more on this question?
For CPU assignment between fuzzing and the python script, you do not need to assign a dedicated CPU for the python script. Since the Katz centrality computation in Networkit uses a quite effective power method and is implemented in C++. Each katz centrality computation takes around a few seconds or less. I suggest allocating around 10 or, at most, 20 cores for these python scripts for 60 fuzzing tasks (i.e., assume each fuzzing task requires a single dedicated CPU core).

egoterm commented 1 year ago

Hi, I have the last question. According to build_example I set the fuzzing cmd as follows:
./afl-fuzz_kscheduler -i /testcases/others/elf -o afl_out_cent -d -m none ./size @@
I wonder if K-Scheduler can adjust the algorithm parameters according to different options like MOPT? Can the current command take full advantage of K-Scheduler? If I have 100 cores, do I have to allocate 50 cores to run python scripts and 50 cores to run afl-fuzz? Because I tried to run 60 tasks on 100 cores, some tasks had very different edge coverage results than others. Thank you and look forward to your reply.
Are there any adjustable parameters in K-Scheduler like Mopt? No, the centrality scores used in K-Scheduler are already adaptive to different programs and different horizon graphs. But there do exist a few parameters in the centrality-based weight computations. Please look at our paper section 4.E for more details. And current implementation of the K-Scheduler is still a prototype. Feel free to hack it if you want.

"Could the current command take full advantage of K-Scheduler?" What do you mean by "take full advantage of K-Scheduler"? Could you elaborate more on this question?

For CPU assignment between fuzzing and the python script, you do not need to assign a dedicated CPU for the python script. Since the Katz centrality computation in Networkit uses a quite effective power method and is implemented in C++. Each katz centrality computation takes around a few seconds or less. I suggest allocating around 10 or, at most, 20 cores for these python scripts for 60 fuzzing tasks (i.e., assume each fuzzing task requires a single dedicated CPU core).

About "take full advantage of K-Scheduler", I was trying to express whether my fuzzing command omitted some options of K-Scheduler and caused K-Scheduler to get suboptimal edge coverage when fuzzing the target program, but I re-checked the tutorial and it seems that running the command should be fine. Thank you for your answer.

Dongdongshe / K-Scheduler

Dynamically-linked binary error on K-Scheduler and corresponding fix. #10