evanmak / savior-source

source code for savior fuzzer
Apache License 2.0
126 stars 27 forks source link

AFL filename formats #11

Closed cponcelets closed 3 years ago

cponcelets commented 3 years ago

Goal: Fix filename formats between SAVIOR and AFL.

Issue:

AFL can use two kinds of filename formats:

These two formats are exclusive, i.e. choosing the simple one with the flag SIMPLE_FILES will prevent AFL from identifying files following the standard format (id:[0-9]{6}).

However, SAVIOR:

As a consequence AFL does not read SAVIOR testcases because of a format mismatch.


A first solution has been committed (commit:e2c18d9bf) removing SIMPLE_FILES flag. However, there is still a mix between simple and standard formats in SAVIOR.

For example::

  1. SAVIOR extends AFL to output statistics into the files coverage.csv, edge_sanitizer.csv. These files are using the simple format to specify testcases.
  2. Whenever an edge oracle reads AFL queues, standard filename formats are imported.

Problems:

  • SAVIOR crashes when a simple file format is accessed (#7 ).
  • A testcase has two different names within SAVIOR algorithms

Solution: Use only simple filename format into SAVIOR.

The idea is to:

  1. convert the AFL filename into simple format whenever SAVIOR imports AFL testcases.
  2. match the standard filename back whenever SAVIOR uses or accesses a file.

This way, a testcase has a unique internal name into SAVIOR (the simple formatted one).


Example:

Before:

# ls output_folder/master/queue/
id_000000  id_000001
# ls output_folder/slave_000001/queue/
id_000000  id_000001

# ls output_folder/klee_instance_conc_000001/queue/
id:000001  id:000002  id:000003

od output_folder/master/.synced/klee_instance_conc_000001
0000000 000000 000000
0000004

Comments:

After:

ls output_folder/master/queue/
id:000000,orig:seed1.txt                   id:000002,sync:klee_instance_conc_000001,src:000002,+cov
id:000001,src:000000,op:havoc,rep:16,+cov  id:000003,src:000002,op:int32,pos:8,val:-2147483648,+cov

# ls output_folder/klee_instance_conc_000001/queue/
id:000001  id:000002  id:000003

# od output_folder/master/.synced/klee_instance_conc_000001
0000000 000004 000000
0000004

Comment:

Just to be sure and check the AFL testcases:

# ./savior-example < output_folder/master/queue/id:000000,orig:seed1.txt
# ./savior-example < output_folder/master/queue/id:000001,src:000000,op:havoc,rep:32,+cov
# ./savior-example < output_folder/master/queue/id:000002,sync:klee_instance_conc_000001,src:000002,+cov
Magic number passed

cponcelets commented 3 years ago

Note that only the two first commits directly concern the PR. I added the example and a release flag to build llvm faster (in case you are interested).

evanmak commented 3 years ago

Thanks for the PR @cponcelets! I will take a closer look at the code sometime soon, sorry about that as I am quite busy these days. Meanwhile, can you please provide a bit more context on why you would like to convert the file name format? Is using the simplified name across AFL, KLEE and coordinator making fuzzing more difficult? While it is less informative, it makes the implementation much easier and less room for (inconsistency) bugs.

A bit more context on why using the simplified format, as KLEE needs to convert the synthesized input back to something AFL can recognize, it does not have the mutation context on this seed, thus I chose the simplified format in the first place.

BTW, I am not opinionated towards either directions though, just curious on what you think the pros and cons are : )

cponcelets commented 3 years ago

A) Meanwhile, can you please provide a bit more context on why you would like to convert the file name format?

Sure, let me be clearer. As you said, there are three modules within SAVIOR: the fuzzer (AFL), the coordinator and the concolic engine (klee). Now, you have also different data-flows between these modules:

  1. AFL -> Coordinator (reading afl queues)
  2. AFL -> Coordinator (coverage/score statistics)
  3. Coordinator -> Klee (running a concolic execution)
  4. Klee -> AFL (outputting back new testcases)

As shown in the example, the current version is using:

In order for AFL to understand klee outputs:

B) Is using the simplified name across AFL, KLEE and coordinator making fuzzing more difficult? While it is less informative, it makes the implementation much easier and less room for (inconsistency) bugs.

True, the problem here is that I cannot change klee converter outputs. It should also work if klee outputs id_<num> format files. Beside, I think the standard format is more understandable for users, but it is a personal preference.

C) ... as KLEE needs to convert the synthesized input back to something AFL can recognize, it does not have the mutation context on this seed, thus I chose the simplified format in the first place.

Yes, but this is not a problem, only id:000000 works. You can also follow the qsym way which is only adding a src to keep track of the seed a testcase has been generated from.

The problem are the first_seen values in AFL. You cannot retrieve easily filenames from ids as it is currently implemented in AFL. This is the reason why I kept simple file format inside the coordinator, demanding the format conversions.

To depict you an overview after the PR, the modules use:

Correct me if I am wrong but the coordinator uses filename as a testcase id in its seed lists. In order to map the scores/first_seen values to a file, it needs to unify the names coming from A. and B. It is thus necessary to convert standard into simple formats and avoid confusions between id:000001:orig and id_000001 for example which both point to the same file.

evanmak commented 3 years ago

Thanks for the detailed clarification, Indeed KLEE outputting the standard format will disrupt AFL from benefiting from the generated seeds. I was scratching my head to recall what happened, cuz when we experimented before we see AFL imported KLEE's generated inputs, @junxzm1990 please hold me accountable here.

Now here we have 2 options, make the rest of the modules understand standard format, and we have more insightful names maybe for further analysis, or we can ask KLEE to output simplified name to have minimum change.

I am happy to accept the PR btw, but would like to flag it to @DanielGuoVT for awareness as he is working to open source KLEE, and AFAIK there is another version of savior in Baidu's internal repo.

cponcelets commented 3 years ago

can we remove the binary files incl. .o and .o.bc?

Of course, make clean should even remove these hidden files (.savior_sanitizer_combination and .afl_coverage_combination also).