dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.06k stars 1.56k forks source link

Suggestions of doing fuzzy testing on Dart program + Dart FFI? #54390

Closed fzyzcjy closed 8 months ago

fzyzcjy commented 8 months ago

Hi thanks for the great language and ecosystem! I wonder any suggestions of doing fuzzy testing on Dart program + Dart FFI?

When developing https://github.com/fzyzcjy/flutter_rust_bridge v2, though it looks stable, and I already have Valgrind as well as Sanitizers, but more tests never harm. Therefore, I am looking at fuzzy tests a little bit. It seems that there are many fuzzer libraries for Rust, C++, etc, but no very popular ones for Dart.

More concretely, I hope to have fuzzy testing on a Dart program + a Rust program, connected via Dart FFI, to detect potential problems such as misuse of Dart native C API, etc.

I saw https://dart.googlesource.com/sdk//+/19e844ed5b2757bb5abcc353e151be52eeeb43f8/runtime/tools/dartfuzz/README.md, but it detects divergence instead of crashes, so I am not sure whether it is suitable.

The https://aflplus.plus/docs/binaryonly_fuzzing/ or similar libraries has binary-only fuzzing which looks interesting. However, I am not sure whether this applies to Dart, which has a VM and runtime. For example, Valgrind is completely confused given any Dart program.

dcharkes commented 8 months ago

I saw https://dart.googlesource.com/sdk//+/19e844ed5b2757bb5abcc353e151be52eeeb43f8/runtime/tools/dartfuzz/README.md, but it detects divergence instead of crashes, so I am not sure whether it is suitable.

It should be relatively easy to use a subset of the code to check only for crashes.

The challenge we found with fuzzing dart:ffi is that it is really easy to generate a program that will crash if the fuzzer just generates random API calls:

void main() {
  final p = Pointer.fromAddress<Int8>(12345);
  p[87654] = 123; // Access unallocated memory!
}

So if you want to generate code that exercises FFI calls, you have to tell the generator to tell the subset of calls that is guaranteed to be safe. We have predefined a set of methods to be safe, for example:

https://github.com/dart-lang/sdk/blob/6ce0504924d4b273ffb5d0a6591769b3cb425890/runtime/tools/dartfuzz/dartfuzz.dart#L351-L352

Actually, there is not such a big difference between a test generator and a fuzzer. A test generator also generates a bunch of code that we know up front is not supposed to crash. A fuzzer takes this to the next step by making the set of tests it generates exponentially larger, but only running a small subset every time. My first suggestion would be to see if you can increase test coverage by generating tests for flutter_rust_bridge. You could potentially take some inspiration from https://github.com/dart-lang/sdk/tree/main/tests/ffi/generator.

If you already have a test generator, you could try to make it generate more and more different programs. The amount of Random in dartfuzz is not that much, so I'm not sure if using dartfuzz itself would be useful. (In the FFI tests generators we also don't reuse code from other places, we just use string concatenation to generate C and Dart files.)

If you do find you'd want to reuse some dartfuzz things, I can see if there would be interest in making it a package that can be reused, as currently it's just a set of internal scripts.

The https://aflplus.plus/docs/binaryonly_fuzzing/ or similar libraries has binary-only fuzzing which looks interesting. However, I am not sure whether this applies to Dart, which has a VM and runtime. For example, Valgrind is completely confused given any Dart program.

Coverage guided fuzzing is an interesting concept, but you'd want to have the coverage of Rust types etc, not of the C++/Dart sources of the Dart executable itself. I'm not aware of any pre-existing tools in the Dart ecosystem that can do it for Dart-only code.

Extending the idea of coverage guided fuzzing to a code generator project takes this to another level. Just brainstorming out loud here: You'd want to make sure you have coverage of the all rust types, so probably you would want to randomly generate Rust programs and then check that the Rust programs that you feed into flutter_rust_bridge exercise all the lines of code in flutter_rust_bridge code generator. But making coverage guidance for this is hard, because the bool/enum conditions for your branching structure is not just some param that can be passed into FRB, it's a result of the parsing of the Rust program you generate. So the coverage guide would basically have to unparse Rust to achieve that. And then the second step would be to generate a Dart program that exercises all the generated bindings.

Because there is a code generator involved, and parsing a String to a program, I think pre-existing fuzzing solutions would not be a good fit. When writing a test generator you tackle both these problems: e.g. you generate both the Rust program and the Dart program exercising the Rust program in one go. If you make your test generator then do an exponential amount of programs, you can use Random to turn it into a fuzzer.

Okay, this has become a rather long essay. I hope it helps!

fzyzcjy commented 8 months ago

Thank you very much for the information! That looks interesting (though require a pretty amount of engineering so may not be implemented in a short term), and I do see what is going on with dartfuzz, the traditional fuzzer, and the test generators.