HexHive / retrowrite

RetroWrite -- Retrofitting compiler passes through binary rewriting
Other
655 stars 78 forks source link

change basic block label format from .L%x to .L%d #27

Open Marsman1996 opened 2 years ago

Marsman1996 commented 2 years ago

Using .L%x could miss some instrumentations when instrumenting binary with AFL.

diagprov commented 2 years ago

Hello, thanks for the PR.

Can you explain the problem with using .L[0-9A-F]as a label format versus an integer format? Which instrumentation is missing when running AFL and do you have a minimum test case that reproduces the issue? If so could you please share it here?

If you're using afl-gcc, this is a wrapper around gcc itself and gcc supports arbitrary label names that would be valid symbol names in an ELF binary, so we should be able to encode these integers any way we like, provided they are unique. I would be surprised if picking labels with A-F in them defeats AFL.

I'm not sure which AFL variant you are using, but I'd strongly recommend AFL++ available here. https://github.com/AFLplusplus/AFLplusplus - This version is supported, while the original AFL has been somewhat abandoned, and may have issues with the latest Linux distributions that might explain what you're seeing. But the best place to start is a small test case so we can see and reproduce your issue.

Thanks a lot!

Marsman1996 commented 2 years ago

Hi,

Sorry for the inconvenient that I put some key information in #28 instead of putting them in this PR.

The minimum test case that reproduces the issue.

I tested nm in binutils, and the assembly code can be downloaded here. As I stated in #28, the .L9ffea basic block is instrumented while .La0047 and .La0058 are not instrumented. And after the fix, the number of instrumentation increase from 39511 to 47795.

I'm not sure which AFL variant you are using, but I'd strongly recommend AFL++ available here.

Yes, I am using AFL++. Actually, almost all AFL-family fuzzers inherent the instrumentation strategy of afl-gcc/afl-clang from vanilla AFL. People tend to modify the LLVM IR mode. Taking the afl-as.c in AFL++ as an example:

        if ((isdigit(line[2]) ||
             (clang_mode && !strncmp(line + 1, "LBB", 3))) &&
            R(100) < (long)inst_ratio) {

As we can see, it only instrument the label with the format .L[0-9] which is achieved by isdigit(line[2]).

Best wishes