AFLplusplus / Grammar-Mutator

A grammar-based custom mutator for AFL++
Apache License 2.0
215 stars 18 forks source link

Progress Report #1

Closed vanhauser-thc closed 3 years ago

vanhauser-thc commented 4 years ago

Hi Shengtuo,

please put your weekly progress reports in this issue. Besides, I have not seen any progress the last 5 days. Do you need assistance?

h1994st commented 4 years ago

Hi Marc,

During the past week, I mainly focused on the serialization/deserialization part of the grammar mutator, because the grammar mutations require tree representations rather than the serialized string format.


I did have some questions about the fuzzing queue and the grammar mutation. In AFL++, the queue entry in the fuzzing queue only stores the file path of the test case. For each fuzzing round (i.e., fuzz_one / fuzz_one_original), AFL++ reads the test case from the file. If we want to integrate the grammar mutation, we need to keep the tree representation for each test case. Currently, I have two solutions in my mind:

Solution 1 is intuitive and does not require any modifications to AFL++. The only concern is that we need to deserialize/parse the test case for every custom mutation and trimming.

Solution 2 may be more efficient, as it avoids lots of redundant deserialization/parsing operations. The drawback is that we need to modify AFL++ a little bit.

My current plan is to first implement "Solution 1" and then grammar mutations. Do you think "Solution 2" would be beneficial? If so, we can refactor the codebase after we have a prototype (Solution 1) for the grammar mutator.

Thank you!

vanhauser-thc commented 4 years ago

If you have less time than expected then please update your plan to see if you have to remove features, and when which milestones will be realistically reached. There is more in life than GSOC, though we must be able to see if you make progress, if the plan is working and where we can help.

Solution 1 sounds very costly and inefficient. There is a custom mutator hook that is run when a new queue entry is added. you could use that to perform the deserialize/parse, then you would to have to do that only once for each queue entry - and writing it away, e.g. into the queue filename + ".tree" or some other extension added. Just load that ... .tree file when the custom mutation is started.

andreafioraldi commented 4 years ago

Hi @h1994st , drop solution 1. Solution 2 is ok, btw you can also do it storing the data in an external way. Look at https://github.com/AFLplusplus/AFLSmart-Custom-Mutator/blob/master/custom_mutator.cpp#L7. Here I store the parsed tree in a map which the key is the name of the testcase. This has the same effect of adding an extra field to queue entry but does not require modifications in AFL++. Modifications in afl-fuzz are not a problem, btw is hard to have an extra field that works with multiple custom mutators.

vanhauser-thc commented 4 years ago

Modifications in afl-fuzz are not a problem, btw is hard to have an extra field that works with multiple custom mutators.

especially if you then have two custom mutators using that same field ...

so better save it in an extra file like andrea is doing in his aflsmart solution but perform the deserialization in a hook for queue_new_entry

h1994st commented 4 years ago

@vanhauser-thc @andreafioraldi OK, I see. I will try Andrea's approach. Thank you!

vanhauser-thc commented 4 years ago

@h1994st please commit on a daily basis so we see the progress being made. Now the last commit was three days ago and it is unclear for me if you are working full time every minute of being awake or not at all or something in between ... :)

h1994st commented 4 years ago

@vanhauser-thc Got it. I will keep it in mind.

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

Here is the list of completed tasks during this week

TODO for next week:

andreafioraldi commented 4 years ago

Keep going, good work!

vanhauser-thc commented 4 years ago

Thank you! Looks like you made good progress :) Two things: 1) I think we should do a phone call again. please propose a few times that fit you. 2) when would be a good milestone for us to test the features you are working on?

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi,

Thanks!

Here are my available time slots:

Please let me know if there are any conflicts.

For the milestone, after finishing all mutation and trimming strategies (i.e., after July 17), you should be able to test the grammar mutator, at least for fuzzing JSON parser. I will notify you guys once I finish it.

vanhauser-thc commented 4 years ago

I would prefer July 10 at 10am<->4pm, otherwise 11 or 12 July. @andreafioraldi ?

andreafioraldi commented 4 years ago

12 july for me

vanhauser-thc commented 4 years ago

Then July 12 10am Shengtuo's and 4pm our time.

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

Will we use the same Google Meet link as last time?

Thanks!

vanhauser-thc commented 4 years ago

https://meet.google.com/vux-vnmp-mrn

h1994st commented 4 years ago

Got it!

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

Here is the list of completed tasks during this week

TODO for next week:

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

As we mentioned before, around July 17, you can have a try on the grammar mutator prototype. At least, it now works for the JSON parser example described in README.md. For now, the grammar mutator has

I need to refactor my codes to implement another trimming strategy "subtree minimization". The current grammar mutator does not parse the initial test cases but generates its own test cases at the beginning. I will take care of the parsing part later.

Please let me know if you have any feedback on the performance, implementation details, etc. :) As usual, I will give you a completed progress update at the end of this week.

BTW:

While running the grammar mutator, I found that the mutated tree may represent an empty string, which makes the fuzzer throw an error. Therefore, I submitted a pull request to allow the fuzzer to run the target on an empty test case. Not sure if this is a good solution.

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

For this week, mainly focus on:

TODO for next week:

h1994st commented 3 years ago

Hi @vanhauser-thc and @andreafioraldi ,

For this week, mainly focus on:

TODO for next week:

h1994st commented 3 years ago

Hi @vanhauser-thc and @andreafioraldi ,

For this week, mainly focus on:

TODO for next week:

BTW, how do you guys often benchmark afl-fuzz? The current fuzzing speed is a little bit slow. I want to identify the bottleneck and then optimize the performance.

vanhauser-thc commented 3 years ago

BTW, how do you guys often benchmark afl-fuzz? The current fuzzing speed is a little bit slow. I want to identify the bottleneck and then optimize the performance.

depends on what you want to benchmark it on - speed, coverage, ...

for speed comparison I copy the selected versions and do something like

TIME=125
afl-system-config
for fuzzer in afl-fuzz-1 afl-fuzz-2; do
  for run in `seq 1 20`; do
    afl-fuzz -V$TIME -i in -o out-$fuzzer-$run -d -s 123 -b 1 -- ./target
  done
done

(note that -b requires current dev, binds to cpu #1 (the second core/thread).) the first runs usually have higher exec counts that the following as there the CPU is still cool, so disregard them and see where the other stabilize approx and then compare the the different fuzzers. There compare the execs_done or execs_per_sec in $out/fuzzer_stats

for coverage you can do the same but there switch -V for -E (and a value that lets the fuzzer run for a useful time, e.g 10 minutes) and do only one run. -E is for amount of execs (however is just an approx amount) instead of runtime. for easy comparison you can compare the paths_found values in $out/fuzzer_stats, if you want it more complete than compare edges discovered (requires afl-clang-lto (llvm 11+) or afl-clang-fast (llvm 9+) with default instrumentation:

for file in out-$fuzzer/queue/id*; do
  cat $file | afl-showmap -o - -q -- ./target
done | sed 's/:.*//' | sort -u > EDGES.$fuzzer
andreafioraldi commented 3 years ago

For performance use gprof or perf

h1994st commented 3 years ago

@vanhauser-thc @andreafioraldi Thanks for your suggestions. I will try the approaches you mentioned here.

h1994st commented 3 years ago

Hi @vanhauser-thc and @andreafioraldi ,

For this week, mainly focus on:

TODO for next week:

vanhauser-thc commented 3 years ago

you are working totally on your own ... which is fine however it gives me the feeling I am not doing a good job :) do you need any assistance? or have any questions? anything where I can help?

h1994st commented 3 years ago

you are working totally on your own ... which is fine however it gives me the feeling I am not doing a good job :) do you need any assistance? or have any questions? anything where I can help?

Hi @vanhauser-thc,

Thanks a lot for your kindness! You do help me a lot. Actually, for the time being, not too many technical questions. I will definitely contact you once I encounter any problems.

But it will be great, if you can give inputs about whether there are any potential improvements for the current grammar mutator, from a user's perspective.

BTW, after August 24, we need to submit the project for the final evaluation. Except for tasks in my TODO list, what else do you think I still need to do to pass the final evaluation? (e.g., tests, documents, etc.)

vanhauser-thc commented 3 years ago

BTW, after August 24, we need to submit the project for the final evaluation. Except for tasks in my TODO list, what else do you think I still need to do to pass the final evaluation? (e.g., tests, documents, etc.)

@andreafioraldi what is your opinion?

IMHO it is

andreafioraldi commented 3 years ago

For me it is very important that this mutator greatly outperforms the afl generic mutator on targets like ruby or jsc. +1 also for the "for dummies" readme and for the doc. You have time, and I am confident that everything will turn out great. Don't hesitate to ask for the last difficult part that you have to code, the parsing stage. You can also easily reach the Nautilus authors for any question directly to them, just mail Cornelius and say that you are a student for AFL++ for a super-fast response.

h1994st commented 3 years ago

I see. Thanks for your responses! I will try to figure out soon the parsing stage and other supporting documents for the next two weeks.

h1994st commented 3 years ago

BTW, good luck for your WOOT presentation.😏

h1994st commented 3 years ago

Will give the weekly report later today, as I had something urgent yesterday. Sry for the delay.

@vanhauser-thc and @andreafioraldi, I have three questions:

  1. This one is a little bit naive. How do you often plot a graph describing the edge coverage against the time? I noticed there is some related information in fuzzer_stats. Do I need to monitor this file frequently while running the fuzzer? Or other better approaches?
  2. Could I add one more custom mutator hook, afl_custom_fuzz_steps, to control the running times of the custom mutator? In nautilus, each mutation may run at least 100 times for a given queue entry, but our current custom mutator cannot control the running steps (i.e., stage_max in fuzz_one_original). If the running times can be controlled by the custom mutator, I can add a deterministic mutation strategy and also control the running steps of the grammar mutator.
  3. I may hold off the parsing part because I noticed that nautilus does not parse the input test cases but generates 1000 inputs while starting the fuzzer. I want to spend more time optimizing the performance (e.g., coverage). What do you think?
vanhauser-thc commented 3 years ago
1. This one is a little bit naive. How do you often plot a graph describing the edge coverage against the time? I noticed there is some related information in `fuzzer_stats`. Do I need to monitor this file frequently while running the fuzzer? Or other better approaches?

I think never :) I look what coverage is reached after a specific time though. Not that I do not think it is not worth it, rather I would want afl-plot to be able to plot from several runs together so it is comparable. never got down to code that though. (it is an item in TODO.md)

This is useful though (needs dev branch though): afl-showmap -C -i out -o /dev/null -- target and -i is the -o of afl-fuzz. This will analyze all edges discovered in the queue and print the number. requires llvm 9+ and either afl-clang-fast or afl-clang-lto in default instrumentation.

2. Could I add one more custom mutator hook, `afl_custom_fuzz_steps`, to control the running times of the custom mutator? In `nautilus`, each mutation may run at least 100 times for a given queue entry, but our current custom mutator cannot control the running steps (i.e., `stage_max` in `fuzz_one_original`). If the running times can be controlled by the custom mutator, I can add a deterministic mutation strategy and also control the running steps of the grammar mutator.

if AFL_CUSTOM_MUTATOR_ONLY is used then this would only affect the number of inputs generated per run queue entry per cycle right? does not sound that useful IMHO

for runs without _ONLY I can see the benefit though.

for me it is fine if you add it, just remember to add it do the documentation.

3. I may hold off the parsing part because I noticed that `nautilus` does not parse the input test cases but generates 1000 inputs while starting the fuzzer. I want to spend more time optimizing the performance (e.g., coverage). What do you think?

for me the order you do it is not important. IMHO parsing the inputs is important though so this should be implemented within the project.

h1994st commented 3 years ago

Hi @vanhauser-thc and @andreafioraldi ,

Thanks for your response!

I guess increasing the number of inputs generated per fuzz_one_original can help increase the coverage, but I need to confirm my hypothesis by manipulating the number of running times. If it does help, I will submit a pull request to AFL++. Will keep you updated about this.

I agree that paring the inputs is important so that users can reuse the previous input corpus. I will try my best to implement this.


The progress of the last week is a little bit slow. I mainly focused on trying to optimize the coverage of the grammar mutator, but got a bit lost on how to improve the coverage. Specifically, I refactored 1_c_gen.py to maximize the size of the generated tree. Overall, the performance of the grammar mutator is still not good enough.

There are several tasks for next week, including the delayed tasks

andreafioraldi commented 3 years ago

I guess increasing the number of inputs generated per fuzz_one_original can help increase the coverage, but I need to confirm my hypothesis by manipulating the number of running times. If it does help, I will submit a pull request to AFL++. Will keep you updated about this.

Custom mutator is using the same number of iteration of havoc, so yes maybe we should add a custom mutator API that manipulates the number of iterations. afl_custom_fuzz_steps sounds ok.

Implement the parsing part

You will get coverage mutating good seed IMO, not generating from scratch, so you should prioritize this before optimizing things. Think also about how to parse inputs that are synced from other fuzzer instances running in parallel.

andreafioraldi commented 3 years ago

For the parsing part, maybe you should just use a parser generator like ANTLR or bison.

andreafioraldi commented 3 years ago

You can just define a compatibility layer (so that for us will be easy to add other parser generators in the future) that is called by your generated parser in eg. ANTLR C++ (yes marc said C++ bad but ANTLR does not emit C) to build your tree.

andreafioraldi commented 3 years ago

This can be helpful i think https://tomassetti.me/getting-started-antlr-cpp/

andreafioraldi commented 3 years ago

Note that you can easily parse and ANTLR4 grammar file using ANTLR itself, https://github.com/antlr/grammars-v4/tree/master/antlr/antlr4, so should be easy to add ANTLR4 as format alongside json for grammars. For json, it is easy to generate in python an ANTLR grammar from a json grammar.

h1994st commented 3 years ago

@andreafioraldi Thanks for your information!

ANTLR can definitely help with the parsing part. You are right: we need a compatibility layer to convert ParseTree in ANTLR to our tree representation.

I will take a look at the links you recommended.

h1994st commented 3 years ago

Hi @vanhauser-thc and @andreafioraldi ,

For this week, mainly focus on:

TODO for next week:

vanhauser-thc commented 3 years ago

Next week ... there is not next week :) you have to hand in everything final on Monday next week. I am away over the weekend. I would push everything as far as you can on Thursday night, so I can review and test on Friday morning before I leave at lunch time. Then you have feedback to work on over the weekend.

Concentrate on code, documentation can still be done after Thursday evening.

I can implement the afl_custom_fuzz_steps hook in afl-fuzz for you. it is not part of the grammar mutator so I think it is fine if we take care of that. what do you need as inputs? I would pass the data, len and filename? or the queue structure pointer?

h1994st commented 3 years ago

Hi @vanhauser-thc ,

Yeah, I know no more week for GSoC. "Next" week, I mean, is this week (Aug 24-28), as the report is written yesterday on my local time ;)

Got it. I will try to wrap up everything and get back to you on Thursday night.

I envision afl_custom_fuzz_steps would have the same function signature as afl_custom_init_trim.

int32_t afl_custom_fuzz_steps(my_mutator_t *data, uint8_t *buf,
                              size_t buf_size);

This new hook will be used by afl-fuzz to control the stage_max for a custom mutator (afl-fuzz-one.c#L1669). It should be invoked before afl-fuzz-one.c#L1687. I think this would be enough.

vanhauser-thc commented 3 years ago

It is in there. function signature:

u32 (*afl_custom_fuzz_count)(void *data, const u8 *buf, size_t buf_size);
h1994st commented 3 years ago

Thank you so much! I will integrate it into the grammar mutator soon.

andreafioraldi commented 3 years ago

Maybe the coverage issue is related to the generator.

I observed some jsons created with ./grammar_generator and I noted that the generator prefers to go in deep in the grammar. You can see that most of the generated files have a lot of whitespaces and escape chars like \t \f etc.

If you look at json_grammar.json you can see that escape chars are at a deeper level in the grammar compared to other tokens that can be placed in a string:

    "<string>": [["\"", "<characters>", "\""]],
    "<characters>": [["<character-1>"]],
    "<character>": [["0"], ["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"],
                    ["8"], ["9"], ["a"], ["b"], ["c"], ["d"], ["e"], ["f"],
                    ["g"], ["h"], ["i"], ["j"], ["k"], ["l"], ["m"], ["n"],
                    ["o"], ["p"], ["q"], ["r"], ["s"], ["t"], ["u"], ["v"],
                    ["w"], ["x"], ["y"], ["z"], ["A"], ["B"], ["C"], ["D"],
                    ["E"], ["F"], ["G"], ["H"], ["I"], ["J"], ["K"], ["L"],
                    ["M"], ["N"], ["O"], ["P"], ["Q"], ["R"], ["S"], ["T"],
                    ["U"], ["V"], ["W"], ["X"], ["Y"], ["Z"], ["!"], ["#"],
                    ["$"], ["%"], ["&"], ["\""], ["("], [")"], ["*"], ["+"],
                    [","], ["-"], ["."], ["/"], [":"], [";"], ["<"], ["="],
                    [">"], ["?"], ["@"], ["["], ["]"], ["^"], ["_"], ["`"],
                    ["{"], ["|"], ["}"], ["~"], [" "], ["<esc>"], ["'"]],
    "<esc>": [["\\","<escc>"]],
    "<escc>": [["\\"],["b"],["f"], ["n"], ["r"],["t"],["\""]],

Why the generator prefers to generate escc tokens? Is it a bug in f1?

Seems like if in a rule there is or , the nonterminal is followed and the terminal is almost never generated.

andreafioraldi commented 3 years ago

Idk if this can be the cause, but looks at the generated C. In function gen_node_character, there is this snipper before the switch case (i added some comments):

if (rule_index < 0 || rule_index >= 95) {
    int num_rules = 0;
    if (max_len < 3) {
      num_rules = 94;
    } else {
      num_rules = 95; // here with max_len >= 3, seems likely
    }

    val = 0;
    if (num_rules == 94) { // likely num_rules == 95, goto else branch
      val = map_rand(94);
    } else {
      // random(95-94) -> random(1) -> always 0
      val = map_rand(num_rules - 94) + 94; // always 94, generate always esc node
    }
  } else {
    val = rule_index;
  }

  ...

  switch(val) {

   ....

  case 94:
    ...
    subnode = gen_node_esc(subnode_max_len, &subnode_consumed, -1);
    ...
    break;

  }

  return node;
andreafioraldi commented 3 years ago

I don't have the full picture of the code, maybe it is intended, but the generated seeds looks strange

andreafioraldi commented 3 years ago

In 100 seeds, if I grep for \t I have 50 lines, just 2, if I grep for W. Seems reasonable that nonterminals have more priority (is it? maybe not) but not at this level, basically terminals that are not in the last levels of the grammar seem almost never generated.