Closed asanrocks closed 1 year ago
Hi, I am pleased you can use the CoFuzz
fuzzing tool. Here are the answers to your concerns.
- The compiler version and flags used to compile the programs.
Please follow the instructions of the provided script to run readelf
. Here we use the wllvm
to generate the bitcode of the whole program for the subsequent instrumentation. Note that the wllvm
can affect the edge coverage results due to different AFL
instrumentation, and it should be the difference between 10,000 branches & 6,000 branches for libxml2
.
- The command line arguments when invoking the fuzzer.
We enable the parallel mode with two instances for AFL
. Here the master fuzzer run the original AFL
while the slave fuzzer only enables the havoc
mutation stage. Previous work has indicated the power of the havoc
stage and AFL++ runs havoc
stage by default. Here you can run AFL
with -d
to only enable havoc
stage and validate the performance.
- The measurement of the edge coverage.
Here we adopt two ways with the same edge coverage results. 1) use the fuzzing bitmap, the edge coverage is equals the number of bytes < 255. 2) use the AFL
built-in tool afl-showmap
to calculate the edge coverage for each single seed and combine to get the total edge coverage.
We're also attempting to integrate CoFuzz
into Fuzzbench for better evaluation, but there are some obstacles.
Please let me know if you have any further questions :)
Thank you for your response! I appreciate the information you provided, as it greatly helps in comprehending your setup and reproducing the results. Nevertheless, I still have a few minor concerns regarding the baseline experiment (not the CoFuzz experiment). Specifically, I am interested in understanding the comparison between existing symbolic execution techniques and the popular fuzzers.
Given the inherently complex nature of fuzzing research, where a fuzzer's performance can be influenced by numerous variables, I would greatly appreciate it if you could provide me with further details regarding the specific flags used during the compilation of the binary and the execution of the AFL++ experiment. This includes the instrumentation mode, optimization level, and any special environment flags employed.
Thank you once again for your assistance!
I am writing to discuss some challenges I have encountered while attempting to reproduce the results presented in your insightful paper. Specifically, I am facing difficulties in replicating the performance of existing fuzzers, as outlined in Table 3 of the original publication.
In your paper, a minor discrepancy of 0.9% is mentioned for the code coverage of AFL and AFL++ when evaluating the program libxml2. However, during my own experimentation, I have observed a significant advantage of AFL++ over AFL, with a difference of up to 5% in terms of code coverage. Additionally, I noticed a noteworthy disparity in the raw coverage data. The coverage obtained from Clang's source-based coverage analysis exceeds 10,000 branches, whereas the data provided in your paper indicates a coverage of approximately 6,000 branches.
Given that I followed the setup and methodology described in FuzzBench (see also https://www.fuzzbench.com/reports/2023-05-06-sample/index.html), I suspect that I might have misconfigured the compiler flags or encountered issues with edge measurement. I was wondering if you could kindly provide me with some clarification regarding the disparities in the evaluation data between your paper and the FuzzBench framework, for example:
I genuinely appreciate your expertise and would be grateful for any assistance you can offer. Thank you very much for your time and attention to this matter.