Progress Report - Githubissues

vanhauser-thc commented 4 years ago

Hi Shengtuo,

please put your weekly progress reports in this issue. Besides, I have not seen any progress the last 5 days. Do you need assistance?

h1994st commented 4 years ago

Hi Marc,

During the past week, I mainly focused on the serialization/deserialization part of the grammar mutator, because the grammar mutations require tree representations rather than the serialized string format.

Add a parsing tree structure (parsing_tree.h / parsing_tree.c)
- Finish the serialization part to dump a parsing tree to a buffer
Refactor the F1 fuzzer generator (f1_c_gen.py) to generate the parsing tree rather than string representations for test cases
Add corresponding test cases for the grammar mutator and the parsing tree structure

I did have some questions about the fuzzing queue and the grammar mutation. In AFL++, the queue entry in the fuzzing queue only stores the file path of the test case. For each fuzzing round (i.e., fuzz_one / fuzz_one_original), AFL++ reads the test case from the file. If we want to integrate the grammar mutation, we need to keep the tree representation for each test case. Currently, I have two solutions in my mind:

Solution 1
- Deserialize/Parse each test case before custom mutations (afl_custom_fuzz) and trimming (afl_custom_trim)
Solution 2
- Add an extra_data field for the queue_entry so that we can save the parsing tree in the memory
- Add two parameters for afl_custom_fuzz so that we can pass extra_data to the grammar mutator and retrieve the mutated parsing tree (extra_data_out)

Solution 1 is intuitive and does not require any modifications to AFL++. The only concern is that we need to deserialize/parse the test case for every custom mutation and trimming.

Solution 2 may be more efficient, as it avoids lots of redundant deserialization/parsing operations. The drawback is that we need to modify AFL++ a little bit.

My current plan is to first implement "Solution 1" and then grammar mutations. Do you think "Solution 2" would be beneficial? If so, we can refactor the codebase after we have a prototype (Solution 1) for the grammar mutator.

Thank you!

vanhauser-thc commented 4 years ago

If you have less time than expected then please update your plan to see if you have to remove features, and when which milestones will be realistically reached. There is more in life than GSOC, though we must be able to see if you make progress, if the plan is working and where we can help.

Solution 1 sounds very costly and inefficient. There is a custom mutator hook that is run when a new queue entry is added. you could use that to perform the deserialize/parse, then you would to have to do that only once for each queue entry - and writing it away, e.g. into the queue filename + ".tree" or some other extension added. Just load that ... .tree file when the custom mutation is started.

andreafioraldi commented 4 years ago

Hi @h1994st , drop solution 1. Solution 2 is ok, btw you can also do it storing the data in an external way. Look at https://github.com/AFLplusplus/AFLSmart-Custom-Mutator/blob/master/custom_mutator.cpp#L7. Here I store the parsed tree in a map which the key is the name of the testcase. This has the same effect of adding an extra field to queue entry but does not require modifications in AFL++. Modifications in afl-fuzz are not a problem, btw is hard to have an extra field that works with multiple custom mutators.

vanhauser-thc commented 4 years ago

Modifications in afl-fuzz are not a problem, btw is hard to have an extra field that works with multiple custom mutators.

especially if you then have two custom mutators using that same field ...

so better save it in an extra file like andrea is doing in his aflsmart solution but perform the deserialization in a hook for queue_new_entry

h1994st commented 4 years ago

@vanhauser-thc @andreafioraldi OK, I see. I will try Andrea's approach. Thank you!

vanhauser-thc commented 4 years ago

@h1994st please commit on a daily basis so we see the progress being made. Now the last commit was three days ago and it is unclear for me if you are working full time every minute of being awake or not at all or something in between ... :)

h1994st commented 4 years ago

@vanhauser-thc Got it. I will keep it in mind.

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

Here is the list of completed tasks during this week

Add tree mutations
- Implement the random mutation
- Partially implement the random recursive mutation
Update json_custom_mutator.cpp to store generated tree structures while storing interesting test cases in the queue
To enable uniform selection of subnodes, update the tree structure to include node types and the number of non-terminal subnodes
Others:
- Use CMake instead of Makefile
- Add googletest as the testing framework & Add corresponding test cases for implemented functionalities

TODO for next week:

Finish the tree mutations
- Update the random recursive mutation
- Implement the splicing mutation
- Implement AFL mutations for the tree
Update json_c_fuzz.c to annotate recursive rules (prerequisite for "random recursive mutation" and "recursive minimization")
Others
- Add Valgrind memcheck as a testing task

andreafioraldi commented 4 years ago

Keep going, good work!

vanhauser-thc commented 4 years ago

Thank you! Looks like you made good progress :) Two things: 1) I think we should do a phone call again. please propose a few times that fit you. 2) when would be a good milestone for us to test the features you are working on?

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi,

Thanks!

Here are my available time slots:

July 9, after 5 pm (after 11 pm, berlin time)
July 10, 11, 12, after 10 am (after 4 pm, berlin time)

Please let me know if there are any conflicts.

For the milestone, after finishing all mutation and trimming strategies (i.e., after July 17), you should be able to test the grammar mutator, at least for fuzzing JSON parser. I will notify you guys once I finish it.

vanhauser-thc commented 4 years ago

I would prefer July 10 at 10am<->4pm, otherwise 11 or 12 July. @andreafioraldi ?

andreafioraldi commented 4 years ago

12 july for me

vanhauser-thc commented 4 years ago

Then July 12 10am Shengtuo's and 4pm our time.

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

Will we use the same Google Meet link as last time?

Thanks!

vanhauser-thc commented 4 years ago

https://meet.google.com/vux-vnmp-mrn

h1994st commented 4 years ago

Got it!

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

Here is the list of completed tasks during this week

Finish the random recursive mutation
- Update tree.c to calculate recursion edges in a tree
- Update f1_c_gen.py to generate function array and node types
Add the splicing mutation
- Add a chunk store (referring to nautilus/chunkstore.rs) to store all interesting subtrees
Others:
- Add Valgrind memcheck as a testing task
- Add the Github workflow for CI

TODO for next week:

Finish the trimming
- Subtree minimization
- Recursive minimization
Others:
- Finish the test cases for the chunk store
- Save trees in the file rather than memory

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

As we mentioned before, around July 17, you can have a try on the grammar mutator prototype. At least, it now works for the JSON parser example described in README.md. For now, the grammar mutator has

One trimming strategy: random recursive trimming
Three mutation strategies: random mutation, random recursive mutation, splicing mutation

I need to refactor my codes to implement another trimming strategy "subtree minimization". The current grammar mutator does not parse the initial test cases but generates its own test cases at the beginning. I will take care of the parsing part later.

Please let me know if you have any feedback on the performance, implementation details, etc. :) As usual, I will give you a completed progress update at the end of this week.

BTW:

While running the grammar mutator, I found that the mutated tree may represent an empty string, which makes the fuzzer throw an error. Therefore, I submitted a pull request to allow the fuzzer to run the target on an empty test case. Not sure if this is a good solution.

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

For this week, mainly focus on:

Finish the trimming
- Subtree minimization
- Recursive minimization
Integrate implemented mutations and trimming with the custom mutator
Others:
- Fix memory leaks and update the test cases for the chunk store

TODO for next week:

Extend and test the grammar mutator on other grammars
Revise the trimming part in the custom mutator
Save trees in the file rather than memory (A leftover task from last week)

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

For this week, mainly focus on:

Revise the trimming part in the custom mutator
- Fix a segmentation fault in the trimming part (#2)
- Add subtree trimming into the custom mutator
Implement the serialization and deserialization methods for the tree representation
- Save all trees in trees directory in the output directory
- For the entire tree, store all nodes in a depth-search order
- For each tree node, save id, subnode_count, val_len, and val
Others:
- Update test cases for the chunk store
- Add a test case to run afl-fuzz with the grammar mutator on json-parser

TODO for next week:

Add the generation of xxx_c_fuzz.c as a building step in the compilation
- (A delayed task) Ensure the current implementation can work well with different grammars
Add Makefile support

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

For this week, mainly focus on:

Add Makefile support (see README.md for usage)
Add the generation of xxx_c_fuzz.c as a building step in both Makefile and CMake
Test the current implementation on ruby by fuzzing mruby
Add a helper script to convert nautilus grammar files (see grammars/README.md for usage)

TODO for next week:

Update f1_c_gen.py to replace pool_of_strings with pool_of_trees
Replace std::string with C string and related functions
Replace googletesting with another C-based testing framework
Optimize the performance

BTW, how do you guys often benchmark afl-fuzz? The current fuzzing speed is a little bit slow. I want to identify the bottleneck and then optimize the performance.

vanhauser-thc commented 4 years ago

BTW, how do you guys often benchmark afl-fuzz? The current fuzzing speed is a little bit slow. I want to identify the bottleneck and then optimize the performance.

depends on what you want to benchmark it on - speed, coverage, ...

for speed comparison I copy the selected versions and do something like

TIME=125
afl-system-config
for fuzzer in afl-fuzz-1 afl-fuzz-2; do
  for run in `seq 1 20`; do
    afl-fuzz -V$TIME -i in -o out-$fuzzer-$run -d -s 123 -b 1 -- ./target
  done
done

(note that -b requires current dev, binds to cpu #1 (the second core/thread).) the first runs usually have higher exec counts that the following as there the CPU is still cool, so disregard them and see where the other stabilize approx and then compare the the different fuzzers. There compare the execs_done or execs_per_sec in $out/fuzzer_stats

for coverage you can do the same but there switch -V for -E (and a value that lets the fuzzer run for a useful time, e.g 10 minutes) and do only one run. -E is for amount of execs (however is just an approx amount) instead of runtime. for easy comparison you can compare the paths_found values in $out/fuzzer_stats, if you want it more complete than compare edges discovered (requires afl-clang-lto (llvm 11+) or afl-clang-fast (llvm 9+) with default instrumentation:

for file in out-$fuzzer/queue/id*; do
  cat $file | afl-showmap -o - -q -- ./target
done | sed 's/:.*//' | sort -u > EDGES.$fuzzer

andreafioraldi commented 4 years ago

For performance use gprof or perf

h1994st commented 4 years ago

@vanhauser-thc @andreafioraldi Thanks for your suggestions. I will try the approaches you mentioned here.

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

For this week, mainly focus on:

Update f1_c_gen.py to replace pool_of_strings with pool_of_trees
Replace std::string with C string and related functions
Replace std::map and std::vector in the chunk store with a C map (rxi/map) and the linked list

TODO for next week:

Performance optimization: flatten the tree instead of using a recursive structure
- According to my perf results from test_custom_mutator, the chunk store, used in the splicing mutation, is a bottleneck. This is due to the hierarchy structure of the tree. In this case, for complex test cases, all mutation, trimming, and chunk store operations are recursive functions with very deep call stacks. Actually, nautilus uses a flattened array to store the tree, instead of the recursive tree structure. Thus, I will refactor the tree structure in the next week
Implement the parsing part

vanhauser-thc commented 4 years ago

you are working totally on your own ... which is fine however it gives me the feeling I am not doing a good job :) do you need any assistance? or have any questions? anything where I can help?

h1994st commented 4 years ago

you are working totally on your own ... which is fine however it gives me the feeling I am not doing a good job :) do you need any assistance? or have any questions? anything where I can help?

Hi @vanhauser-thc,

Thanks a lot for your kindness! You do help me a lot. Actually, for the time being, not too many technical questions. I will definitely contact you once I encounter any problems.

But it will be great, if you can give inputs about whether there are any potential improvements for the current grammar mutator, from a user's perspective.

BTW, after August 24, we need to submit the project for the final evaluation. Except for tasks in my TODO list, what else do you think I still need to do to pass the final evaluation? (e.g., tests, documents, etc.)

vanhauser-thc commented 4 years ago

BTW, after August 24, we need to submit the project for the final evaluation. Except for tasks in my TODO list, what else do you think I still need to do to pass the final evaluation? (e.g., tests, documents, etc.)

@andreafioraldi what is your opinion?

IMHO it is

a README.md that describes how to use it (an easy basic example, like json) and an advanced example where the user has to create the grammar file himself/herself, and also how to configure to create deeper mutations (longer trees, etc. at the cost of CPU/memory)
- well documented code
- code compiles with just a few warnings if any
- does not crash when run (a 12 hour test run schon not fail)
- has a good mutation quality. this is very hard to measure. IMHO from an empty seed it should be able to generate a wide coverage that is better than normal afl-fuzz in an 1h run.
- A understandable TODO.md list that documents what still needs to be done, and what would be nice to have.

andreafioraldi commented 4 years ago

For me it is very important that this mutator greatly outperforms the afl generic mutator on targets like ruby or jsc. +1 also for the "for dummies" readme and for the doc. You have time, and I am confident that everything will turn out great. Don't hesitate to ask for the last difficult part that you have to code, the parsing stage. You can also easily reach the Nautilus authors for any question directly to them, just mail Cornelius and say that you are a student for AFL++ for a super-fast response.

h1994st commented 4 years ago

I see. Thanks for your responses! I will try to figure out soon the parsing stage and other supporting documents for the next two weeks.

h1994st commented 4 years ago

BTW, good luck for your WOOT presentation.😏

h1994st commented 4 years ago

Will give the weekly report later today, as I had something urgent yesterday. Sry for the delay.

@vanhauser-thc and @andreafioraldi, I have three questions:

This one is a little bit naive. How do you often plot a graph describing the edge coverage against the time? I noticed there is some related information in fuzzer_stats. Do I need to monitor this file frequently while running the fuzzer? Or other better approaches?
Could I add one more custom mutator hook, afl_custom_fuzz_steps, to control the running times of the custom mutator? In nautilus, each mutation may run at least 100 times for a given queue entry, but our current custom mutator cannot control the running steps (i.e., stage_max in fuzz_one_original). If the running times can be controlled by the custom mutator, I can add a deterministic mutation strategy and also control the running steps of the grammar mutator.
I may hold off the parsing part because I noticed that nautilus does not parse the input test cases but generates 1000 inputs while starting the fuzzer. I want to spend more time optimizing the performance (e.g., coverage). What do you think?

vanhauser-thc commented 4 years ago

1. This one is a little bit naive. How do you often plot a graph describing the edge coverage against the time? I noticed there is some related information in `fuzzer_stats`. Do I need to monitor this file frequently while running the fuzzer? Or other better approaches?

I think never :) I look what coverage is reached after a specific time though. Not that I do not think it is not worth it, rather I would want afl-plot to be able to plot from several runs together so it is comparable. never got down to code that though. (it is an item in TODO.md)

This is useful though (needs dev branch though): afl-showmap -C -i out -o /dev/null -- target and -i is the -o of afl-fuzz. This will analyze all edges discovered in the queue and print the number. requires llvm 9+ and either afl-clang-fast or afl-clang-lto in default instrumentation.

2. Could I add one more custom mutator hook, `afl_custom_fuzz_steps`, to control the running times of the custom mutator? In `nautilus`, each mutation may run at least 100 times for a given queue entry, but our current custom mutator cannot control the running steps (i.e., `stage_max` in `fuzz_one_original`). If the running times can be controlled by the custom mutator, I can add a deterministic mutation strategy and also control the running steps of the grammar mutator.

if AFL_CUSTOM_MUTATOR_ONLY is used then this would only affect the number of inputs generated per run queue entry per cycle right? does not sound that useful IMHO

for runs without _ONLY I can see the benefit though.

for me it is fine if you add it, just remember to add it do the documentation.

3. I may hold off the parsing part because I noticed that `nautilus` does not parse the input test cases but generates 1000 inputs while starting the fuzzer. I want to spend more time optimizing the performance (e.g., coverage). What do you think?

for me the order you do it is not important. IMHO parsing the inputs is important though so this should be implemented within the project.

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

Thanks for your response!

I guess increasing the number of inputs generated per fuzz_one_original can help increase the coverage, but I need to confirm my hypothesis by manipulating the number of running times. If it does help, I will submit a pull request to AFL++. Will keep you updated about this.

I agree that paring the inputs is important so that users can reuse the previous input corpus. I will try my best to implement this.

The progress of the last week is a little bit slow. I mainly focused on trying to optimize the coverage of the grammar mutator, but got a bit lost on how to improve the coverage. Specifically, I refactored 1_c_gen.py to maximize the size of the generated tree. Overall, the performance of the grammar mutator is still not good enough.

There are several tasks for next week, including the delayed tasks

Performance optimization: flatten the tree instead of using a recursive structure
More mutations: implement the rules mutation
Implement the parsing part

andreafioraldi commented 4 years ago

I guess increasing the number of inputs generated per fuzz_one_original can help increase the coverage, but I need to confirm my hypothesis by manipulating the number of running times. If it does help, I will submit a pull request to AFL++. Will keep you updated about this.

Custom mutator is using the same number of iteration of havoc, so yes maybe we should add a custom mutator API that manipulates the number of iterations. afl_custom_fuzz_steps sounds ok.

Implement the parsing part

You will get coverage mutating good seed IMO, not generating from scratch, so you should prioritize this before optimizing things. Think also about how to parse inputs that are synced from other fuzzer instances running in parallel.

andreafioraldi commented 4 years ago

For the parsing part, maybe you should just use a parser generator like ANTLR or bison.

andreafioraldi commented 4 years ago

You can just define a compatibility layer (so that for us will be easy to add other parser generators in the future) that is called by your generated parser in eg. ANTLR C++ (yes marc said C++ bad but ANTLR does not emit C) to build your tree.

andreafioraldi commented 4 years ago

This can be helpful i think https://tomassetti.me/getting-started-antlr-cpp/

andreafioraldi commented 4 years ago

Note that you can easily parse and ANTLR4 grammar file using ANTLR itself, https://github.com/antlr/grammars-v4/tree/master/antlr/antlr4, so should be easy to add ANTLR4 as format alongside json for grammars. For json, it is easy to generate in python an ANTLR grammar from a json grammar.

h1994st commented 4 years ago

@andreafioraldi Thanks for your information!

ANTLR can definitely help with the parsing part. You are right: we need a compatibility layer to convert ParseTree in ANTLR to our tree representation.

I will take a look at the links you recommended.

h1994st commented 4 years ago

Hi @vanhauser-thc and @andreafioraldi ,

For this week, mainly focus on:

Finish the parsing part by incorporating ANTLR4 lexer & parser + runtime
- The building script generates the lexer and parser for a given grammar
- The compilation of ANTLR4 C++ runtime may take a long time
- Add corresponding test cases and benchmark
Fix bugs while running the grammar mutator with afl-fuzz (it should work correctly now)
Add a grammar generator that can generate an input corpus, including test cases and corresponding tree files

TODO for next week:

Wrap up the project for the final evaluation: documents, TODOs
Add a special node type to store invalid content. Also, need to add AFL-style mutation for the invalid nodes
- In the latest commit, tree generation and mutation always produce grammar-valid test cases. I guess this might be a reason that leads to low coverage.
- I will dig more on the reason of low coverage
(A delayed task) Add a new hook (afl_custom_fuzz_steps) for the custom mutator

vanhauser-thc commented 4 years ago

Next week ... there is not next week :) you have to hand in everything final on Monday next week. I am away over the weekend. I would push everything as far as you can on Thursday night, so I can review and test on Friday morning before I leave at lunch time. Then you have feedback to work on over the weekend.

Concentrate on code, documentation can still be done after Thursday evening.

I can implement the afl_custom_fuzz_steps hook in afl-fuzz for you. it is not part of the grammar mutator so I think it is fine if we take care of that. what do you need as inputs? I would pass the data, len and filename? or the queue structure pointer?

h1994st commented 4 years ago

Hi @vanhauser-thc ,

Yeah, I know no more week for GSoC. "Next" week, I mean, is this week (Aug 24-28), as the report is written yesterday on my local time ;)

Got it. I will try to wrap up everything and get back to you on Thursday night.

I envision afl_custom_fuzz_steps would have the same function signature as afl_custom_init_trim.

int32_t afl_custom_fuzz_steps(my_mutator_t *data, uint8_t *buf,
                              size_t buf_size);

This new hook will be used by afl-fuzz to control the stage_max for a custom mutator (afl-fuzz-one.c#L1669). It should be invoked before afl-fuzz-one.c#L1687. I think this would be enough.

vanhauser-thc commented 4 years ago

It is in there. function signature:

u32 (*afl_custom_fuzz_count)(void *data, const u8 *buf, size_t buf_size);

h1994st commented 4 years ago

Thank you so much! I will integrate it into the grammar mutator soon.

andreafioraldi commented 4 years ago

Maybe the coverage issue is related to the generator.

I observed some jsons created with ./grammar_generator and I noted that the generator prefers to go in deep in the grammar. You can see that most of the generated files have a lot of whitespaces and escape chars like \t \f etc.

If you look at json_grammar.json you can see that escape chars are at a deeper level in the grammar compared to other tokens that can be placed in a string:

    "<string>": [["\"", "<characters>", "\""]],
    "<characters>": [["<character-1>"]],
    "<character>": [["0"], ["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"],
                    ["8"], ["9"], ["a"], ["b"], ["c"], ["d"], ["e"], ["f"],
                    ["g"], ["h"], ["i"], ["j"], ["k"], ["l"], ["m"], ["n"],
                    ["o"], ["p"], ["q"], ["r"], ["s"], ["t"], ["u"], ["v"],
                    ["w"], ["x"], ["y"], ["z"], ["A"], ["B"], ["C"], ["D"],
                    ["E"], ["F"], ["G"], ["H"], ["I"], ["J"], ["K"], ["L"],
                    ["M"], ["N"], ["O"], ["P"], ["Q"], ["R"], ["S"], ["T"],
                    ["U"], ["V"], ["W"], ["X"], ["Y"], ["Z"], ["!"], ["#"],
                    ["$"], ["%"], ["&"], ["\""], ["("], [")"], ["*"], ["+"],
                    [","], ["-"], ["."], ["/"], [":"], [";"], ["<"], ["="],
                    [">"], ["?"], ["@"], ["["], ["]"], ["^"], ["_"], ["`"],
                    ["{"], ["|"], ["}"], ["~"], [" "], ["<esc>"], ["'"]],
    "<esc>": [["\\","<escc>"]],
    "<escc>": [["\\"],["b"],["f"], ["n"], ["r"],["t"],["\""]],

Why the generator prefers to generate escc tokens? Is it a bug in f1?

Seems like if in a rule there is or , the nonterminal is followed and the terminal is almost never generated.

andreafioraldi commented 4 years ago

Idk if this can be the cause, but looks at the generated C. In function gen_node_character, there is this snipper before the switch case (i added some comments):

if (rule_index < 0 || rule_index >= 95) {
    int num_rules = 0;
    if (max_len < 3) {
      num_rules = 94;
    } else {
      num_rules = 95; // here with max_len >= 3, seems likely
    }

    val = 0;
    if (num_rules == 94) { // likely num_rules == 95, goto else branch
      val = map_rand(94);
    } else {
      // random(95-94) -> random(1) -> always 0
      val = map_rand(num_rules - 94) + 94; // always 94, generate always esc node
    }
  } else {
    val = rule_index;
  }

  ...

  switch(val) {

   ....

  case 94:
    ...
    subnode = gen_node_esc(subnode_max_len, &subnode_consumed, -1);
    ...
    break;

  }

  return node;

andreafioraldi commented 4 years ago

I don't have the full picture of the code, maybe it is intended, but the generated seeds looks strange

andreafioraldi commented 4 years ago

In 100 seeds, if I grep for \t I have 50 lines, just 2, if I grep for W. Seems reasonable that nonterminals have more priority (is it? maybe not) but not at this level, basically terminals that are not in the last levels of the grammar seem almost never generated.

AFLplusplus / Grammar-Mutator

Progress Report #1