AFLplusplus / LibAFL

Advanced Fuzzing Library - Slot your Fuzzer together in Rust! Scales across cores and machines. For Windows, Android, MacOS, Linux, no_std, ...
Other
2k stars 309 forks source link

OnDiskCorpus files be configurable to contain a human readable representation of the input #2538

Open riesentoaster opened 2 weeks ago

riesentoaster commented 2 weeks ago

Most fuzzers will likely use some form of OnDiskCorpus (incl. InMemoryOnDiskCorpus, CachedOnDiskCorpus, etc.) for their solutions. To then figure out, what the problem actually was, one would need to know the content of the testcase/input that triggered the feedbacks. Currently, corpora storing them on disk store a bunch of generic information in the file associated with the testcase/input (such as runtime), but no representation of the input.

The only way to do add this without resorting to writing dummy-feedbacks that do nothing but add a new metadata with the input content, is by implementing the filename generating function on the input to extract the testcase from the corpus, and somehow stringify it:

fn generate_name(&self, id: Option<CorpusId>) -> String;

However, file names have a length restriction, so this isn't usable for inputs that can get somewhat long. Plus, for structured inputs, it would be much easier to have the entire structure nicely formatted in the file.

domenukk commented 2 weeks ago

I don't fully understand: The OnDiskCorpus will contain the "content of the testcase/input that triggered the inputs"- that's what it's for, right?

That being said, currently the correct(tm) way to add metadata to a Testcase is via custom Feedbacks that do nothing like here: https://github.com/AFLplusplus/LibAFL/blob/e370e2f852b28aa0c4baedff426005429dbb6c08/libafl/src/feedbacks/stdio.rs#L107

riesentoaster commented 2 weeks ago

Yes, the corpus will contain everything, of course. But it isn't written to disk, so when I kill the fuzzer, I lose everything but the metadata (found in the .metadata file). And that doesn't per default contain the input that triggered a crash (or whatever you're looking for). So I can't reproduce the crash.

domenukk commented 2 weeks ago

Why is the _ OnDisk_Corpus not written to disk? What crash are you talking about? A crash in the fuzzer or a crash in the target? Crashes in the target are of course included in the corpus (if you have a CrashFeedback)? Sorry, I'm confused...

riesentoaster commented 2 weeks ago

Ah, I see, seems like I missed something. If I understand correctly, the input content is serialised and written to disk in this method on Input, to the file associated with the crash without an extension or a leading dot:

/// Write this input to the file
fn to_file<P>(&self, path: P) -> Result<(), Error>
where
    P: AsRef<Path>,
{
    write_file_atomic(path, &postcard::to_allocvec(self)?)
}

When initialising the corpus, a format can be passed, and while this leaves the metadata nicely formatted, the input itself is still serialised and thus not human readable.

 OnDiskCorpus::with_meta_format(
    PathBuf::from("./crashes"),
    OnDiskMetadataFormat::JsonPretty,
)
.unwrap(),

So I guess I'm asking for an option for human-readable serialisation of the input when written to disk.

riesentoaster commented 2 weeks ago

I guess I could also just implement this for my input, so a global option may not be strictly necessary, but it would still be nice, just for consistency.

riesentoaster commented 2 weeks ago

Related question: All input types in the repo (at least as far as I can see) generate their testcase names (fn generate_name(&self, id: Option<CorpusId>) -> String; on Input) the exact same way: hash their content (for collection types, namely Vecs, this is done manually for some reason) and take the first 16 bytes.

Should there not just be a blanket implementation that does this for any input that implements Hash (or where this is derived)?

domenukk commented 1 week ago

For a human-readable serialization there is the DumpToDiskStage that goes through new inputs and serializes them with a provided closure. Is this what you are looking for?

riesentoaster commented 6 days ago

Yes, this kind of does what I would want it to do, but

  1. It also serialises corpus, not just solutions (and returns an error if passed something like /dev/null)
  2. I need to manually do the serialisation, as opposed to just telling it (like passing OnDiskMetadataFormat::JsonPretty)

Depending on how large your corpus gets and the change-rate within it, the first point may annoying to a considerable downside. The second is not critical, just a bit of extra code, would just be easier without it :)

Plus I would expect this kind of functionality in the corpus, especially OnDiskCorpus, not in a stage — that's probably also why I haven't found this.

domenukk commented 6 days ago

Feel free to fix the first point :) For the second point, we could have a number of serialiser functions in LibAFL, right?

Open for other suggestions of course.