Closed cponcelets closed 3 years ago
Note that only the two first commits directly concern the PR. I added the example and a release flag to build llvm faster (in case you are interested).
Thanks for the PR @cponcelets! I will take a closer look at the code sometime soon, sorry about that as I am quite busy these days. Meanwhile, can you please provide a bit more context on why you would like to convert the file name format? Is using the simplified name across AFL, KLEE and coordinator making fuzzing more difficult? While it is less informative, it makes the implementation much easier and less room for (inconsistency) bugs.
A bit more context on why using the simplified format, as KLEE needs to convert the synthesized input back to something AFL can recognize, it does not have the mutation context on this seed, thus I chose the simplified format in the first place.
BTW, I am not opinionated towards either directions though, just curious on what you think the pros and cons are : )
A) Meanwhile, can you please provide a bit more context on why you would like to convert the file name format?
Sure, let me be clearer. As you said, there are three modules within SAVIOR: the fuzzer (AFL), the coordinator and the concolic engine (klee). Now, you have also different data-flows between these modules:
As shown in the example, the current version is using:
id:
which is the standard format).In order for AFL to understand klee outputs:
B) Is using the simplified name across AFL, KLEE and coordinator making fuzzing more difficult? While it is less informative, it makes the implementation much easier and less room for (inconsistency) bugs.
True, the problem here is that I cannot change klee converter outputs. It should also work if klee outputs id_<num>
format files.
Beside, I think the standard format is more understandable for users, but it is a personal preference.
C) ... as KLEE needs to convert the synthesized input back to something AFL can recognize, it does not have the mutation context on this seed, thus I chose the simplified format in the first place.
Yes, but this is not a problem, only id:000000
works. You can also follow the qsym way which is only adding a src to keep track of the seed a testcase has been generated from.
The problem are the first_seen
values in AFL. You cannot retrieve easily filenames from ids as it is currently implemented in AFL. This is the reason why I kept simple file format inside the coordinator, demanding the format conversions.
To depict you an overview after the PR, the modules use:
Correct me if I am wrong but the coordinator uses filename as a testcase id in its seed lists. In order to map the scores/first_seen values to a file, it needs to unify the names coming from A. and B. It is thus necessary to convert standard into simple formats and avoid confusions between id:000001:orig
and id_000001
for example which both point to the same file.
Thanks for the detailed clarification, Indeed KLEE outputting the standard format will disrupt AFL from benefiting from the generated seeds. I was scratching my head to recall what happened, cuz when we experimented before we see AFL imported KLEE's generated inputs, @junxzm1990 please hold me accountable here.
Now here we have 2 options, make the rest of the modules understand standard format, and we have more insightful names maybe for further analysis, or we can ask KLEE to output simplified name to have minimum change.
I am happy to accept the PR btw, but would like to flag it to @DanielGuoVT for awareness as he is working to open source KLEE, and AFAIK there is another version of savior in Baidu's internal repo.
can we remove the binary files incl. .o and .o.bc?
Of course, make clean should even remove these hidden files (.savior_sanitizer_combination and .afl_coverage_combination also).
Goal: Fix filename formats between SAVIOR and AFL.
Issue:
AFL can use two kinds of filename formats:
id:[0-9]{6}
) and followed by AFL information.However, SAVIOR:
SIMPLE_FILES
,As a consequence AFL does not read SAVIOR testcases because of a format mismatch.
A first solution has been committed (commit:e2c18d9bf) removing
SIMPLE_FILES
flag. However, there is still a mix between simple and standard formats in SAVIOR.For example::
coverage.csv
,edge_sanitizer.csv
. These files are using the simple format to specify testcases.Problems:
Solution: Use only simple filename format into SAVIOR.
The idea is to:
This way, a testcase has a unique internal name into SAVIOR (the simple formatted one).
Example:
Before:
Comments:
od
(dumping the file in octal formats). The second0000000
shows that none of the testcases have been checked.After:
Comment:
4
, all the testcases have been checked.Just to be sure and check the AFL testcases: