Feature: enable invscov into afl++ (Fuzzbench/OSS-Fuzz)

laurentsimon commented 3 years ago

Your code is not even released and here's your first issue... :-)

We'd love to see your code enabled into afl++ as a special mode.

Afl++ is already supported in Fuzzbench and it used actively to test new fuzzing techniques. If you could enable invscov in mainstream afl++, we'd easily be able to test it with a combination of other afl++ options!

Afl++ is also used in OSS-Fuzz, which is used to continuously fuzz hundreds of open source projects.

andreafioraldi commented 3 years ago

Afl++ is already supported in Fuzzbench and it used actively to test new fuzzing techniques

Wow I was not aware of it, really :)

The practical challenges to integrate the feedback from LLVM values in-tree in AFL++ are not trivial. Basically, I used Daikon to not reinvent the wheel, but I really don't like it and I feel that single-thread Java monster does not fit in the AFL++ codebase that we distribute to the public that is not an academic protoype. In addition, stopping the campaign to run a tool and recompile many times the PUT is not so user-friendly as we want AFL++ to be.

So, the way to go, is to code another invariants miner and embed it direclty into the runtime to learn invariants during the execution. I'm working on it, there are several challenges to overcome (e.g. as you don't know the variables involved in invariants at compile time, the pass instruments all, so the produced binary is slow) but I will adress them in the near future. Research is incremental :)

laurentsimon commented 3 years ago

I see what you mean. Note that it's fine to have a non-optimized version of invscov in Fuzzbench to get early feedback on how it performs on the benchmarks. Perfect is the enemy of good, and industry is incremental too!

If it's easier to integrate a non-afl++ version, that's fine too. We don't need to necessarily learn the invariants at runtime. We (Fuzzbench), could start invscov with an existing corpus (we have plenty) and invscov can learn the invariants once at start time. We can manually create several invscov campaigns, run one after the other, each time hardcoding the previous corpus as initial seed to learn the invariants from. We could do that, say, 3 times to see if it plateaus or not. This would already give us a good idea of how well it will work.

We certainly don't need to have the final optimized version to test it out. Sure, for OSS-Fuzz we'd like a more polished version, but don't let that get in the way for Fuzzbench. We've run several fuzzers from academic researchers (e.g. symcc) and they included the results in their submission. We've had a good experience so far with PoC code.

andreafioraldi commented 3 years ago

If it's easier to integrate a non-afl++ version, that's fine too.

No I don't think that running Daikon inside FuzzBench will be easy. Btw, InvsCov is already AFL++, just the LLVM pass is out-of-tree and there is a bit of python wrapping things that I need to polish and Daikon.

Btw, I will for sure have an usable invariants mode into AFL++ (devel in the "unusual_values" branch) before USENIX (and so before that this prototype will be public). FuzzBench is already evaluating it, I already explained some details in the upcoming experiment https://github.com/google/fuzzbench/pull/1170 about the first one as it seems promising even if a lot slow ATM.

laurentsimon commented 3 years ago

SG! Thanks for submitting the PR!

vanhauser-thc commented 3 years ago

We'd love to see your code enabled into afl++ as a special mode.

you are aware that @andreafioraldi is one of the 4 maintainers of afl++? ;)

laurentsimon commented 3 years ago

I was not until yesterday, but someone on my team pointed this out :-) hahah I took so much care of introducing afl++... to the maintainers! ^^

andreafioraldi commented 3 years ago

Screenshot at 2021-06-09 10-10-05

That's when all the fuzzers are between 4 and 5 hours (the OSS Fuzz time IIRC). At least here the new prototype seems to work (disabled is vanilla AFL++), but there are several challenges.

I haven't really fixed the speed problem
Invariants are too naive compared to Daikon
The policy to decide when alternate CGF with learning and invscov is dumb (I tested normal that does learning on the first cycle and early that does learning only on the initial corpus, but they are both too naive)

andreafioraldi commented 3 years ago

The results seems also unstable, maybe max is high because the fuzzer can find the bug but the median is low. This is IMO related to the policy used to alternate learning and invariants feedback fuzzing.

andreafioraldi commented 3 years ago

aflplusplus_unusual_disabled is now after 14h slowly reaching enabled. While finding bugs early is very good, I feel that I can do far better because of this additional problem that I forgot:

When an invariant is violated the current impl cannot learn a more strict version and get a feedback when this new version is violated (e.g. x > 0 is violated with x==0 so it should be transformed to x >= 0).

Also, doing a local experiment with pcre2 the stability is dramatically shitty, I have to find a way to compile two binaries with the exact same coverage map :(

Screenshot at 2021-06-09 17-26-41

andreafioraldi commented 3 years ago

@laurentsimon the thing that I noticed is that the initial corpus from OSS-Fuzz is not really saturated, do u know why?

laurentsimon commented 3 years ago

I have to find a way to compile two binaries with the exact same coverage map :(

you mean you want deterministic edge IDs?

the initial corpus from OSS-Fuzz is not really saturated, do u know why?

What's the name of your experiment? @jonathanmetzman where did we get the corpus from?

eurecom-s3 / invscov

Feature: enable invscov into afl++ (Fuzzbench/OSS-Fuzz) #1