Protobuf format in input files is not user friendly

CodeIntelligenceTesting / jazzer

Coverage-guided, in-process fuzzing for the JVM

https://code-intelligence.com

Other

1.01k stars 135 forks source link

Protobuf format in input files is not user friendly #847

Open hadi88 opened 1 year ago

hadi88 commented 1 year ago

Protobufs are being serialized and parsed using the Protobuf binary format. This makes it hard for users to look through input files. For example, it's almost impossible to read a crash producing input to understand the crash.

By small changes to mutation/mutator/proto/BuilderMutatorFactory.java, the write and read methods could use protobuf TextFormat utility so that input files have protos in human readable text format.

fmeum commented 1 year ago

We currently decode the byte array provided by libFuzzer in every execution, so I am a bit worried that switching out the highly optimized binary format for the fully reflection-based text format will regress fuzz test performance. Have you tested this on a representative fuzz test?

The general approach we have been following to make sense of seeds is to rely on the JUnit integration, which allows developers to inspect the fuzz test parameters as proper Java objects simply by running the fuzz test in test mode and setting a break point. We have found that inspecting input files directly creates friction for developers, even if the format is relatively straightforward. But I can see how that would be different in environments where Protobuf is used heavily and the fuzz test accepts only a single Proto parameter.

hadi88 commented 1 year ago

I also haven't tested the performance and I didn't know that the encoding occurs in every iteration. This certainly wouldn't be great.

Thinking out loud... the encoding during fuzzing could be different from the encoding for user-facing objects. But I understand that this may require a major change in the code structure.

fmeum commented 1 year ago

I also haven't tested the performance and I didn't know that the encoding occurs in every iteration. This certainly wouldn't be great.

We are looking into reusing the in-memory objects if the input bytes haven't been loaded from disk. This does require patching libFuzzer though.

If you can run a simple performance test on a real-world fuzz test, that could provide us with very relevant data.

Thinking out loud... the encoding during fuzzing could be different from the encoding for user-facing objects. But I understand that this may require a major change in the code structure.

I fully agree. Again this would be possible by patching libFuzzer and is certainly something we could consider. It's just that so far we found it more effective to improve the Java debugging experience, which somewhat sidesteps the question of what a human-readable input file should look like.

hadi88 commented 1 year ago

It's a little hard to judge the performance requirements for all fuzz targets. libprotobuf_mutator gives users the option to mutate binary protos or text protos, and the default option is mutating text proto:

https://github.com/google/libprotobuf-mutator/blob/master/src/libfuzzer/libfuzzer_macro.h#L26-L35

I'm not sure about the default option, but would it be possible for Jazzer to have both options?

fmeum commented 11 months ago

We will look into this and other ways to make the corpus entries easier to handle eventually. We are currently focusing on polishing the JUnit 5 based workflow though, so I can't say yet when we will get to this.

ghost commented 6 months ago

Hi @hadi88 ! Did you ever get your issue with Jazzer resolved? Just need to understand in detail what you are trying to achieve, and we can give the best options to solve. Ping me? david[dot]merian [at] code-intelligence[dot]com