Open Rot127 opened 3 months ago
Thanks for pinging me Rot127. Are you planning on doing this ? Just putting come comments inline
The fuzzer could need some extensions. For example, it determines the mode and arch by comparing the strings of the enum identifiers
CS_ARCH
andCS_MODE
. This makes it too maintenance heavy. And easy to forget to add new identifiers there.
Sure
It would be better to:
- have it use the
test_mapping.h
file (from Modern testing #2384). To get the mode and arch values for fuzzing. Or something similar centralized.
Looks a good idea
- Allow to track fuzzing coverage of the source code.
What do you mean ?
- Consume the
yaml
test files for fuzzer input generation. Not the legacyMC/*.cs
files. (see Modern testing #2384)
Do the yaml cover all the previous MS/*.cs cases ?
- Remove dead code like
travis.yaml
Looks good
- Use cmake build AND use it in the OSSFuzz build script (removing hard-coded paths)
Looks good
- Don't use the Python bindings to generate the corpus. It should simply parse the yaml files and get the binry info from somewhere else. Currently the fuzzer build step compiles Capstone multiple times.
Where is this somewhere else ?
- LoongArch is not fuzzed
There should be a generic way to create a fuzz target for each architecture indeed...
- Missing modes or architectures are currently just ignored. But it should fail if this is the case.
Ok
Do the yaml cover all the previous MS/*.cs cases ?
Yes! And even way more now. Because we can generate them more reliably from the LLVM regression tests.
Allow to track fuzzing coverage of the source code.
What do you mean ?
Basically what you linked in your second comment. Thanks for that! I wasn't aware of it.
Where is this somewhere else ?
The modern testing is not yet merged. See https://github.com/capstone-engine/capstone/pull/2456. There I defined a test_mapping.h
file. It maps a given enumeration identifier to its value. We could compile an executable from it and expose an API for the bindings.
But I haven't thought this to an end yet. But in general it would be good to have a single point, which does the enumeration identifier -> value
mapping. Currently this behavior is all over the place (const_generator.py
, Python bindings, in the old cstest
code, the fuzzer etc.).
Everyone defined a new table with the mapping they need.
Are you planning on doing this ?
It is not a high priority. But as you probably recognized we heavily modernize Capstone currently. So I also open up issues about issues to not loose track of them. If you want to help of course, feel free to do so :D
Thanks for the answers, it is not high priority for me neither ;-)
The fuzzer could need some extensions. For example, it determines the mode and arch by comparing the strings of the enum identifiers
CS_ARCH
andCS_MODE
. This makes it too maintenance heavy. And easy to forget to add new identifiers there.It would be better to:
test_mapping.h
file (from https://github.com/capstone-engine/capstone/pull/2456). To get the mode and arch values for fuzzing. Or something similar centralized.yaml
test files for fuzzer input generation. Not the legacyMC/*.cs
files. (see https://github.com/capstone-engine/capstone/pull/2456)travis.yaml
cc @catenacyber Because you did most work on it in the past :)