This directory provides the source code of the paper: "BEACON: Directed Grey-Box Fuzzing with Provable Path Pruning"[S&P 2022].
Tested environment: Ubuntu 16 and 18
export BEACON=<path_of_beacon_repository>
You could run $BEACON/scripts/preinstall.sh
to install the dependent tools.
apt-get update --fix-missing
apt-get install -y make build-essential git wget cmake gawk libtinfo-dev libcap-dev zlib1g-dev
# llvm-4.0
apt-get install -y libtinfo5
apt-get install -y xz-utils
wget -q https://releases.llvm.org/4.0.0/clang+llvm-4.0.0-x86_64-linux-gnu-ubuntu-16.10.tar.xz
tar -xf clang+llvm-4.0.0-x86_64-linux-gnu-ubuntu-16.10.tar.xz
rm clang+llvm-4.0.0-x86_64-linux-gnu-ubuntu-16.10.tar.xz
cp -r clang+llvm-4.0.0-x86_64-linux-gnu-ubuntu-16.10 /usr/llvm
cp -r /usr/llvm/bin/* /usr/bin
cp -r /usr/llvm/lib/* /usr/lib
cp -r /usr/llvm/include/* /usr/include
cp -r /usr/llvm/share/* /usr/share
# wllvm
apt-get install -y python3 python3-dev python3-pip
pip3 install --upgrade pip
pip3 install wllvm
You could run $BEACON/scripts/build.sh
to install the dependent tools.
git clone https://github.com/SVF-tools/SVF.git
pushd SVF
git reset --hard 3170e83b03eefc15e5a3707e5c52dc726ffcd60a
sed -i 's/LLVMRELEASE=\/home\/ysui\/llvm-4.0.0\/llvm-4.0.0.obj/LLVMRELEASE=\/usr\/llvm/' build.sh
./build.sh
popd
pushd precondInfer
mkdir build
pushd build
cmake \
-DENABLE_KLEE_ASSERTS=ON \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_CONFIG_BINARY=/usr/bin/llvm-config \
-DSVF_ROOT_DIR=$FUZZER/repo/SVF \
-DSVF_LIB_DIR=$FUZZER/repo/SVF/Release-build/lib \
..
make -j
pushd Ins
mkdir build
pushd build
CXXFLAGS="-fno-rtti" cmake \
-DLLVM_DIR=/usr/lib/cmake/llvm/ \
-DCMAKE_BUILD_TYPE=Release \
..
make -j
popd
You could run $BEACON/scripts/instrument.sh
to instrument the test binary.
It is recommended to run Beacon under a new folder $BEACON/Outputs
to make sure the output files are gathered in a common folder.
mkdir $BEACON/Outputs; cd $BEACON/Outputs
Generate the bitcode file for the target project.
In this repository, we have already provided a demo bc in $BEACON/Test/swftophp-2017-7578.bc
. You could generate your own bitcode file generated by other tools, e.g., clang or wllvm.
Tips: Please always add the debug info (-g) to the bc. Otherwise, the analyzer cannot find the target sites.
$BEACON/precondInfer/build/bin/precondInfer $BEACON/Test/swftophp-2017-7578.bc --target-file=$BEACON/Test/cstest.txt --join-bound=5
Inputs:
$BEACON/Test/swftophp-2017-7578.bc
is the bitcode file for the target project.$BEACON/Test/cstest.txt
has the following content parser.c:66
, which means that the target for directed fuzzing is at Line 66 of parser.c. The target file must contain a single line of the form “fileName:lineNum”.Outputs:
bbreaches.txt
: the set of basic blocks reachable to the target inst.range_res.txt
: range analysis result.transed.bc
: The slightly transformed bc for further processing.Caveats: Beacon uses the debug information in the LLVM IR to find the location in IR that corresponds to the source code location given in the target file. Therefore, the given bitcode should contain debug information. Also, since one source code line can map to multiple LLVM instructions, the target instruction located by Beacon is simply one of those instructions. Finally, the current implementation does not allow the target instruction to be a Phi Instruction.
The target location process can be described using the following pseudo code:
Given (filename, linenum) in the target file
for each instruction I in the given bc:
let (debug_file, debug_line)
be the file name and line number of I recovered from debug information
if (filename is a substring of debug_file) && (linenum == debug_line)
treat I as the target instruction and start the static analysis
Users should supply a “good” source code location in the target file. Beacon will not proceed if the supported target file is illegal.
$BEACON/Ins/build/Ins -output=$BEACON/Outputs/CVE-2017-7578.bc -blocks=$BEACON/Outputs/bbreaches.txt -afl -log=log.txt -load=$BEACON/Outputs/range_res.txt ./transed.bc
The instrumentation tool will take the above three files and output an instrumented bc.
In this example:
swftophp-2017-7578.bc
is the instrumented bitcode file for the target project.bbreaches.txt
is reachable blocks inferred from the previous analysis. The form could vary based on byte code or source code.range_res.txt
is the preconditions inferred from the previous analysis.transed.bc
is the transfromed bc from the previous analysis.Since we have the bc with the infeasible path pruned, we need to compile the bc into an executable binary.
clang $BEACON/Outputs/CVE-2017-7578.bc -o $BEACON/Outputs/CVE-2017-7578 -lm -lz $BEACON/Fuzzer/afl-llvm-rt.o
Finally, fuzz all the things!
You could run $BEACON/scripts/run.sh
to fuzz the test binary.
$BEACON/Fuzzer/afl-fuzz -i $BEACON/Test/fuzz_in -o $BEACON/Outputs/fuzz_out -m none -t 9999 -- $BEACON/Outputs/CVE-2017-7578 @@
Alternatively, you could use docker image (Beacon binary without source code)
The static analysis could influence both reachability analysis and precondition inference to prune infeasible paths, especially for handling indirect calls. The released prototype utilizes a flow-sensitive Anderson pointer analysis. The reachability results can be varied with different pointer analyses and influence the performance of Beacon. Moreover, we noticed that with better static reachability analysis, e.g., an upgraded version of SVF with a higher LLVM version, the results can improved with minor analysis overhead. You can also try our script for reachability analysis based on the dot files exported by any version of SVF, which could have better precision and is used in the evaluation for the paper. We are also looking forward to any optimized static analysis techniques proposed to improve Beacon! Drop me an email (heqhuang at cityu dot edu dot hk) if you have any thoughts or ideas ~
In practice, there are also some compilation and engineering issues requiring more effort and specifications for system and library functions, which makes the control flow graph extracted from LLVM IR not contain the full code for reachability analysis. This will also lead to many paths falsely pruned, which is not the problem caused by our implementation. We have encountered the issue of AFL reporting no instrumentation
. In this case, one of the straightforward solutions is not to use the parameter "-block" in this case during the instrumentation stage. You can also add more specifications for some library or system functions that do not appear in the control flow graph obtained to ensure paths won't get falsely pruned. We are also looking forward to any optimized compilation techniques proposed to improve Beacon! Drop me an email (heqhuang at cityu dot edu dot hk) if you have any thoughts or ideas ~
Our prototype can generate the target binary that can be directly used for other AFL-based fuzzers as the paper said. The prototype in Dockerhub is a unique version for our assessing environment, which does not work with other fuzzers. For general purposes, you should use our released code to generate the binary for other AFL-based fuzzers. You can also modify the instrumentation code to support your own features. In this case, please use your own afl-llvm-rt.o
as well.
We find there are some compatibility issues to generate a whole bc to analyze when serving for Libfuzzer-based fuzzers with an additional afldriver.cpp. If you are willing to help, please let me know through email (hhuangaz at cse dot ust dot hk).
You can find more details in our S&P 2022 paper.
@INPROCEEDINGS{9833751,
author={Huang, Heqing and Guo, Yiyuan and Shi, Qingkai and Yao, Peisen and Wu, Rongxin and Zhang, Charles},
booktitle={2022 IEEE Symposium on Security and Privacy (SP)},
title={BEACON: Directed Grey-Box Fuzzing with Provable Path Pruning},
year={2022},
volume={},
number={},
pages={36-50},
doi={10.1109/SP46214.2022.9833751}}
Beacon is under Apache License.