Closed Rot127 closed 1 week ago
Also considering https://github.com/capstone-engine/capstone/issues/1281 when generating tests.
The current MC test assume that a byte string can be disassembled independently from other byte strings. This is not true for instructions which depend on each other (IT/VPT blocks in ARM for example). This needs to be considered.
@kabeor @aquynh We thought about choosing yaml
as file format for the test files.
It's better to read then json
and can be used for the bindings as well.
Also there are libraries for yaml
in most distros.
Any opinions?
@Rot127 Sounds make sense. Would you like to show us an code example?
I'll add one once the PPC AArch64 refactor is coming to an end.
Testing the build of bindings and their correct function isn't tested in the CI jobs as well (https://github.com/capstone-engine/capstone/pull/2034#issuecomment-1568881681).
To verify that the bindings lead to the same output as the raw C library, currently there are tests which reimplement the test_*
binaries in the binding language including the same exact detail output and just diff
the output of the test_*
binary with e.g. the test_*.py
script.
One could write a test runner in every supported binding language which can be triggered by the cstest
program and outputs in the same format as the detail printers in cstest
.
Currently there are multiple places where the instructions are printed which all have to be updated when the structs change:
Only having one place to stringify an instruction would streamline the update process. When using cstool
as the baseline, instruction printing could be split into a separate library with arm_insn_to_string
functions which only handles rendering the instruction details to a string array. Printing and reformating those parts can be done by the consumers.
Only having one place to stringify an instruction would streamline the update process.
I very much like the idea to centralize the printing of instructions. Would you mind putting this into its own issue?
One could write a test runner in every supported binding language which can be triggered by the cstest program and outputs in the same format as the detail printers in cstest
I am not sure if I understand you here correctly. But I think testing the instruction details by comparing strings between cstool
and bindings output is a bad idea.
If we have the data to test in a yaml
file, we can just as well test the objects directly.
This would also allow to test for binary stuff.
Using YAML to test the data layout is also a custom in the LLVM project:
Only having one place to stringify an instruction would streamline the update process. When using cstool as the baseline, instruction printing could be split into a separate library with arm_insn_to_string functions which only handles rendering the instruction details to a string array. Printing and reformating those parts can be done by the consumers.
The best approach would be a new small library in the same repository - libcsprint
or something. It could be used on its own even outside cstool this way then.
Just an idea how to test every possible execution path and generate a set of instructions which cover every possible detail
combination.
Depending on architecture the number of possible unique ways to set the detail
struct should be around <num operand groups> * 2
(just a guess, but multiple groups share a single execution path, while some have a very diverse one. See add_cs_detail()
).
For AArch64 for example this would be roughly 162 * 2
(a very large number in comparison to other archs).
To determine the execution path for each operand group:
suite/MC/
gcov
on it and diff the coverage graph of all add_cs_detail()
calls.
add_cs_detail()
path, mark this encoding as valid test.200-300
instructions can be checked manually for valid details.Once it's done some files could be removed, e.g.:
suite/regress/*
Testing Capstones disassembly results is possible via too many ways.
MC
tests taken from LLVMs MC test files (insuite/MC
). They only test the disassembly of bytes to their assembly strings. To test those a:cstest
binary.issue.cs
). They look very similar to the MC tests, but are not related. They can test detail information as well.cstest
processes them.This is very confusing and could be unified.
I propose to:
test_<arch>
binaries, because they hard code every test.cstest
.cstest
cstest
cstest
should be written from scratch. It needs modernization anyway (e.g. remove global variables) and we could settle on a single test file format.This new format should support simple
bytes <-> assembly string
testing, as well as testing the content ofcs_detail
. Once this is done we can also write scripts to translate LLVMs MC regression tests to our file format.Before the
v6
release we should also test every possibledetail
setting for correctness. See https://github.com/capstone-engine/capstone/issues/1984#issuecomment-1701493413CI
-DCAPSTONE_DIET