bazelbuild / rules_fuzzing

Bazel Starlark extensions for defining fuzz tests in Bazel projects
Apache License 2.0
81 stars 19 forks source link

Loading corpus inputs from a (compressed) archive. #215

Open zhenyudg opened 1 year ago

zhenyudg commented 1 year ago

As a result of increased fuzzing in our organization, we now have thousands of fuzz corpus files (admittedly a good problem to have :) Currently, we individually store all of these corpus files and pass them to cc_fuzz_test's corpus = glob(["my-corpus-directory/*"]).

Given the proliferation of corpus files, we are interested in storing fuzz corpora in compressed archives (say, as a single corpus.tar.gz for every cc_fuzz_test). Are there existing Bazel rules that can help us feed a compressed archive to cc_fuzz_test's corpus parameter? Alternatively, can we extend rules_fuzzing to support, say, corpus_archives = ["corpus.tar.gz"]?

stefanbucur commented 1 year ago

I'm not 100% sure, but I believe you should be able to define a custom rule that extracts a .tar.gz archive and produces a directory output (so all the extracted files would be written there). Then you can instantiate the rule and use it as a corpus attribute in the fuzz target.

There is also the alternative of extracting the archive as a repository rule, documented here: https://stackoverflow.com/questions/46326749/how-do-i-unzip-a-file-in-bazel-properly-if-i-dont-know-the-contents-of-the-zip

But I can also see merit in supporting corpus archives natively, e.g., through a corpus_archive attribute, so this is a reasonable feature request (PRs welcome, too! 😁 ).