bellard / quickjs

Public repository of the QuickJS Javascript Engine.
https://bellard.org/quickjs
Other
7.98k stars 829 forks source link

New seed corpus for fuzzing #299

Open renatahodovan opened 1 month ago

renatahodovan commented 1 month ago

The current seed corpus used for fuzzing (the combination of tests/ and examples/ directories) is suboptimal for fuzzing purposes. I intend to develop a superior one by refining and dividing the existing tests and adding new items. The updated corpus will consist of numerous small tests rather than the current 21 large ones. I'm wondering if the QuickJS maintainers are interested in such an enhancement. If they are, where would be the appropriate location for hosting this corpus? Can I upload it directly here, into the fuzz/ directory, or should I create a separate repository for it?

saghul commented 1 month ago

Can you share an example file?

renatahodovan commented 1 month ago

Sure! The best example is microbench.js, which is a beast from fuzzing perspective. It was designed to benchmark various JS constructs, which means that it contains ~50 distinct tests in a single one, all of which are executed many times. However, for effective fuzzing, an optimal corpus comprises numerous small test cases to maximize overall code coverage. Therefore, microbench.js should be divided into 50 smaller files and the enclosing loops (used for stress testing) should be eliminated (since they don't increase the coverage, but slows down the execution).

If you check the other tests, you'll notice similar instances where multiple tests are bundled together within a single file, which can be separated. Many of them contains assert checks, which are also problematic during random testing, since it's very likely that a random mutation will convert them to false and the rest of test after the assert will be in vain (even if it would contain a real issue, it won't be triggered).

Finally, the fuzzers we currently apply on QuickJS operate on a corpus of single, independent tests (seeds). Consequently, test cases reliant on secondary files are not ideally suited for fuzzing (e.g., workers or tests with __loadScript) as there's no guarantee that the secondary file remains in the corpus after several iterations. The only exception for testing the usage of secondary files is if we deliberately introduce non-existent references, as this at least tests the initiation of such code paths, but right now it causes hangs.

renatahodovan commented 1 month ago

@saghul I think I have an initial version of the new fuzz corpus. It contains ~140 files, filtered and transformed from the .js files in tests/ and examples/ directories. How should I share them with you? Shall I open a PR here or upload them to a separate GH repo?

chqrlie commented 1 month ago

@saghul I think I have an initial version of the new fuzz corpus. It contains ~140 files, filtered and transformed from the .js files in tests/ and examples/ directories. How should I share them with you? Shall I open a PR here or upload them to a separate GH repo?

Hi Renáta, 140 files is a no no. Can you collect this into a single file with sections and logic to combine them?

renatahodovan commented 1 month ago

@chqrlie The purpose of this refactor was to break down the few large test files into many smaller ones, as performance is critical from a fuzzing perspective. Ideally, we should be able to execute thousands of tests within a single second. While we're not quite there yet (currently around 500 tests/sec), this change is a step forward. Additionally, we've removed the fuzz blocking features from tests, such as asserts, usage of secondary files, command execution and file deletion/creation with random content.

However, I fully understand that the addition of so many new files to this repository can be burdensome. That's why I proposed hosting them in a separate repository. This way, oss-fuzz (or anyone else) can download and utilize the new corpus from there. Furthermore, I can include a note in the fuzzing README indicating where to find it.

saghul commented 1 month ago

@chqrlie How about a dedicated repo? We could trigger a daily build with a GH action like we do for valgrind.

renatahodovan commented 1 month ago

@saghul What is meant by a daily build? What tasks should it perform?

saghul commented 1 month ago

It would run the fuzzer against latest master.

renatahodovan commented 1 month ago

@saghul Of course, we can certainly do that. However, please remember that it's already running with OSS-Fuzz on Google's infrastructure. The latest master of QuickJS is downloaded once a day, and all the fuzz targets are executed. The daily reports can be viewed here. The coverage progress achieved by the fuzzers is available here (though it's quite inconsistent, likely due to hangs caused by the worker tests of the corpus). The discovered issues are reported here, and they can also be mirrored into the main repo.

saghul commented 1 month ago

I see. It would still be nice to be able to run it ourselves. At any rate, would it be problematic if those files lived in a separate repo? Perhaps a submodule would be a good solution @chqrlie ?

renatahodovan commented 1 month ago

I see. It would still be nice to be able to run it ourselves.

Absolutely agreed. I just wanted to ensure that you're aware of the existing infrastructure. However, the more fuzzers we run, the greater the chance of discovering issues, of course :)

renatahodovan commented 1 month ago

I've created an initial repository for the new corpus to show what I thought of. It increases the initial coverage from 8203 to 8921 units in case of fuzz_eval and eliminates the issue related to workers (since worker tests are removed).

If you like it, I can update the oss-fuzz configuration of QuickJS to use this improved corpus, which will hopefully improve its inconsistent coverage results and bug-finding capacity.

renatahodovan commented 1 month ago

Any thoughts on the corpus? Can we use it in oss-fuzz instead of the JS sources of the tests/ and examples/ directories ? If yes, where should we host it (I'm fine with both keeping it as is or with transferring it to somewhere else)?

renatahodovan commented 3 weeks ago

If there are no objections, I'll submit a patch to OSS-Fuzz to use the refined corpus and stop wasting resources on mutating non-fuzzable features. This doesn't mean that it cannot be transferred later to somewhere else or set up a CI locally, but checking on the daily stats I think we should not wait with the corpus update there.