Open renatahodovan opened 1 month ago
Can you share an example file?
Sure! The best example is microbench.js, which is a beast from fuzzing perspective. It was designed to benchmark various JS constructs, which means that it contains ~50 distinct tests in a single one, all of which are executed many times. However, for effective fuzzing, an optimal corpus comprises numerous small test cases to maximize overall code coverage. Therefore, microbench.js should be divided into 50 smaller files and the enclosing loops (used for stress testing) should be eliminated (since they don't increase the coverage, but slows down the execution).
If you check the other tests, you'll notice similar instances where multiple tests are bundled together within a single file, which can be separated. Many of them contains assert checks, which are also problematic during random testing, since it's very likely that a random mutation will convert them to false and the rest of test after the assert will be in vain (even if it would contain a real issue, it won't be triggered).
Finally, the fuzzers we currently apply on QuickJS operate on a corpus of single, independent tests (seeds). Consequently, test cases reliant on secondary files are not ideally suited for fuzzing (e.g., workers or tests with __loadScript) as there's no guarantee that the secondary file remains in the corpus after several iterations. The only exception for testing the usage of secondary files is if we deliberately introduce non-existent references, as this at least tests the initiation of such code paths, but right now it causes hangs.
@saghul I think I have an initial version of the new fuzz corpus. It contains ~140 files, filtered and transformed from the .js files in tests/ and examples/ directories. How should I share them with you? Shall I open a PR here or upload them to a separate GH repo?
@saghul I think I have an initial version of the new fuzz corpus. It contains ~140 files, filtered and transformed from the .js files in tests/ and examples/ directories. How should I share them with you? Shall I open a PR here or upload them to a separate GH repo?
Hi Renáta, 140 files is a no no. Can you collect this into a single file with sections and logic to combine them?
@chqrlie The purpose of this refactor was to break down the few large test files into many smaller ones, as performance is critical from a fuzzing perspective. Ideally, we should be able to execute thousands of tests within a single second. While we're not quite there yet (currently around 500 tests/sec), this change is a step forward. Additionally, we've removed the fuzz blocking features from tests, such as asserts, usage of secondary files, command execution and file deletion/creation with random content.
However, I fully understand that the addition of so many new files to this repository can be burdensome. That's why I proposed hosting them in a separate repository. This way, oss-fuzz (or anyone else) can download and utilize the new corpus from there. Furthermore, I can include a note in the fuzzing README indicating where to find it.
@chqrlie How about a dedicated repo? We could trigger a daily build with a GH action like we do for valgrind.
@saghul What is meant by a daily build? What tasks should it perform?
It would run the fuzzer against latest master.
@saghul Of course, we can certainly do that. However, please remember that it's already running with OSS-Fuzz on Google's infrastructure. The latest master of QuickJS is downloaded once a day, and all the fuzz targets are executed. The daily reports can be viewed here. The coverage progress achieved by the fuzzers is available here (though it's quite inconsistent, likely due to hangs caused by the worker tests of the corpus). The discovered issues are reported here, and they can also be mirrored into the main repo.
I see. It would still be nice to be able to run it ourselves. At any rate, would it be problematic if those files lived in a separate repo? Perhaps a submodule would be a good solution @chqrlie ?
I see. It would still be nice to be able to run it ourselves.
Absolutely agreed. I just wanted to ensure that you're aware of the existing infrastructure. However, the more fuzzers we run, the greater the chance of discovering issues, of course :)
I've created an initial repository for the new corpus to show what I thought of. It increases the initial coverage from 8203 to 8921 units in case of fuzz_eval and eliminates the issue related to workers (since worker tests are removed).
If you like it, I can update the oss-fuzz configuration of QuickJS to use this improved corpus, which will hopefully improve its inconsistent coverage results and bug-finding capacity.
Any thoughts on the corpus? Can we use it in oss-fuzz instead of the JS sources of the tests/ and examples/ directories ? If yes, where should we host it (I'm fine with both keeping it as is or with transferring it to somewhere else)?
If there are no objections, I'll submit a patch to OSS-Fuzz to use the refined corpus and stop wasting resources on mutating non-fuzzable features. This doesn't mean that it cannot be transferred later to somewhere else or set up a CI locally, but checking on the daily stats I think we should not wait with the corpus update there.
The current seed corpus used for fuzzing (the combination of tests/ and examples/ directories) is suboptimal for fuzzing purposes. I intend to develop a superior one by refining and dividing the existing tests and adding new items. The updated corpus will consist of numerous small tests rather than the current 21 large ones. I'm wondering if the QuickJS maintainers are interested in such an enhancement. If they are, where would be the appropriate location for hosting this corpus? Can I upload it directly here, into the fuzz/ directory, or should I create a separate repository for it?