Inquiry Regarding Edge Coverage Discrepancy

zhanghaoran1135 commented 7 months ago

In my reproduction experiment, the edge coverage on readelf reached only 10,606 within 24 hours and basically no longer rising. However, you mentioned an edge coverage of 53,859 in the paper, which is significantly different from my experimental result.

I use ./fuzz -i readelf_new -o seeds -l 7507 ./readelf -a @@ and python nn.py ./readelf -a to start the program I pulled from github project. And according to log file(log_fuzz) _2024-03-30 09:18:03: gradient fuzzing state: linecnt 10 and edge num 5116 I start my program _2024-03-31 09:18:48: gradient fuzzing state: linecnt 100 and edge num 10622 which is 24 hours _2024-04-01 01:49:13: gradient fuzzing state: linecnt 60 and edge num 10947 hardly have any grouth My running devices are RTX4090*2 and Intel (R) Xeon (R) Gold 6248R.

Can you give me some suggestion on this phenomenon. Thanks!

TIAmoCA commented 7 months ago

你好我也是用虚拟机跑了很久objdump覆盖率也提不上去，不知道是怎么回事，能讨论一下吗

zhanghaoran1135 commented 7 months ago

你好我也是用虚拟机跑了很久objdump覆盖率也提不上去，不知道是怎么回事，能讨论一下吗

我用docker、主机环境都测试过，neuzz、MTFuzz对readelf测试，边缘覆盖率都在10000左右基本不上升了，和论文写的50000多差挺大，是不是作者用的计算指标或者算的方法不太一样，想请教一波

TIAmoCA commented 7 months ago

是的，我的更少，objdump我测了一直在5500左右。。。。

Tricker-z commented 7 months ago

Hi

Thank you for your query regarding our method for measuring code coverage. I understand you've noticed some discrepancies between the metric described in our paper and what's implemented in this GitHub repository. I appreciate your diligence in seeking clarification on this matter.

To resolve any confusion, I would recommend reviewing Section 3.4.1 in our paper, where we discuss the rationale behind our choice of metrics, including any variations and the impact they have on our results.

To be specific, AFL utilizes trace_bits to store the state of edge coverage, with each edge being represented by 8 bits to denote different coverage counts. The AFL framework monitors changes at this bit level, preserving seed files for each unique change observed, which is central to its fuzzing efficiency and effectiveness in uncovering unique paths through a codebase.

In our research, we've adopted a nuanced approach to measuring coverage by not only considering the edge coverage data (i.e., whether an edge has been hit) but also by performing an in-depth analysis of all bit-level changes within trace_bits. This methodology allows us to capture a more granular level of coverage, accounting for the varying frequencies of edge executions which, in turn, offers insights into the thoroughness of the test suite and its ability to stimulate different behaviors in the target software.

This distinction is critical, as it enables us to evaluate the effectiveness of test cases on a more detailed scale than merely observing whether an edge was covered. By quantifying all bit changes, we assess the diversity and depth of the test coverage, providing a more comprehensive measure of the fuzzing process's effectiveness.

Thank you once again for your interest in our work. Your engagement with the finer points of our methodology is invaluable to us and the broader research community.

Best regards

Tricker-z / PreFuzz

Inquiry Regarding Edge Coverage Discrepancy #2