capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
http://www.capstone-engine.org
7.64k stars 1.56k forks source link

[Tests] Extend fuzzer #2454

Open Rot127 opened 3 months ago

Rot127 commented 3 months ago

The fuzzer could need some extensions. For example, it determines the mode and arch by comparing the strings of the enum identifiers CS_ARCH and CS_MODE. This makes it too maintenance heavy. And easy to forget to add new identifiers there.

It would be better to:

cc @catenacyber Because you did most work on it in the past :)

catenacyber commented 3 months ago

Thanks for pinging me Rot127. Are you planning on doing this ? Just putting come comments inline

The fuzzer could need some extensions. For example, it determines the mode and arch by comparing the strings of the enum identifiers CS_ARCH and CS_MODE. This makes it too maintenance heavy. And easy to forget to add new identifiers there.

Sure

It would be better to:

  • have it use the test_mapping.h file (from Modern testing #2384). To get the mode and arch values for fuzzing. Or something similar centralized.

Looks a good idea

  • Allow to track fuzzing coverage of the source code.

What do you mean ?

  • Consume the yaml test files for fuzzer input generation. Not the legacy MC/*.cs files. (see Modern testing #2384)

Do the yaml cover all the previous MS/*.cs cases ?

  • Remove dead code like travis.yaml

Looks good

  • Use cmake build AND use it in the OSSFuzz build script (removing hard-coded paths)

Looks good

  • Don't use the Python bindings to generate the corpus. It should simply parse the yaml files and get the binry info from somewhere else. Currently the fuzzer build step compiles Capstone multiple times.

Where is this somewhere else ?

  • LoongArch is not fuzzed

There should be a generic way to create a fuzz target for each architecture indeed...

  • Missing modes or architectures are currently just ignored. But it should fail if this is the case.

Ok

catenacyber commented 3 months ago

cf https://storage.googleapis.com/oss-fuzz-coverage/capstone/reports/20240821/linux/src/capstonenext/report.html for the coverage

Rot127 commented 3 months ago

Do the yaml cover all the previous MS/*.cs cases ?

Yes! And even way more now. Because we can generate them more reliably from the LLVM regression tests.

Allow to track fuzzing coverage of the source code.

What do you mean ?

Basically what you linked in your second comment. Thanks for that! I wasn't aware of it.

Where is this somewhere else ?

The modern testing is not yet merged. See https://github.com/capstone-engine/capstone/pull/2456. There I defined a test_mapping.h file. It maps a given enumeration identifier to its value. We could compile an executable from it and expose an API for the bindings.

But I haven't thought this to an end yet. But in general it would be good to have a single point, which does the enumeration identifier -> value mapping. Currently this behavior is all over the place (const_generator.py, Python bindings, in the old cstest code, the fuzzer etc.). Everyone defined a new table with the mapping they need.

Are you planning on doing this ?

It is not a high priority. But as you probably recognized we heavily modernize Capstone currently. So I also open up issues about issues to not loose track of them. If you want to help of course, feel free to do so :D

catenacyber commented 3 months ago

Thanks for the answers, it is not high priority for me neither ;-)