Open Lancern opened 1 year ago
Currently, for many supported architectures, their
RegId
andInsnGroupId
(some architectures even theInsnId
) are simply defined as a bunch ofu32
values inside a mod rather than a dedicated enum (e.g. ARM, ARM64, x86, etc.). We’d better re-define them to be dedicated enums so that definingArchTag::RegId
andArchTag::InsnGroupId
is meaningful.
I like your change which uses newtype_enum instead of the constified enum module! It's already a big improvement.
Bindgen didn't supported them when I first did this work. In fact, I contributed the "constified enum module" feature to bindgen (https://github.com/rust-lang/rust-bindgen/pull/741) to avoid the less ergonomic constified enum.
However, I think we may be able to do even better (although it might be more annoying). Ideally, we would expose actual Rust enums in the high-level bindings. That way, users could take advantage of the exhaustive pattern matching with match
. The Capstone C code makes sure that 0 is always an "invalid" value such as ARM_SYSREG_INVALID
. When converting from the C value to the Rust value, we could map any invalid value to an "invalid" variant of the enum OR always return an Option<T>
when converting and return None
if we get an invalid value.
// Have internal "invalid" variant
enum ArmReg {
Invalid = 0,
APSR,
APSR_NZCV,
// ...
}
// No invalid variant, so converting gives `Option<ArmReg>`
enum ArmReg {
APSR = 1,
APSR_NZCV,
// ...
}
You don't need to implement this "Rust enum" change as a part of this PR--I just thought I'd let you know what I was thinking.
Ideally, we would expose actual Rust enums in the high-level bindings.
Actually, I tried this at the beginning when writing the second commit. However, I quickly ran into problems.
Take ARM64 registers as an example. The problem is that there are actually 2 underlying C enums that both represent valid ARM64 reg IDs: arm64_reg
and arm64_sysreg
. On the Rust side, each architecture can only have one Rust enum that represents an architecture register. Unfortunately, bindgen cannot merge two C enums into one big Rust enum. Also converting Arm64Sysreg
into Arm64Reg
won't work because the underlying value ranges of Arm64Sysreg
and Arm64Reg
are different. Thus the problem cannot be solved without considerable manual efforts if we generate Rust enums solely.
On the other side, if we generate newtype enums for these two underlying C enums, we can easily convert an Arm64Sysreg
into an Arm64Reg
by manually implementing a From
:
impl From<Arm64Sysreg> for Arm64Reg {
fn from(sysreg: Arm64Sysreg): Self {
Self(sysreg.0)
}
}
Merging #143 (7d8ab8c) into master (74ccb09) will decrease coverage by
7.49%
. Report is 1 commits behind head on master. The diff coverage is91.01%
.:exclamation: Current head 7d8ab8c differs from pull request most recent head 5cd9258. Consider uploading reports for the commit 5cd9258 to get more accurate results
@@ Coverage Diff @@
## master #143 +/- ##
==========================================
- Coverage 93.99% 86.51% -7.49%
==========================================
Files 22 18 -4
Lines 2733 1379 -1354
Branches 2687 1377 -1310
==========================================
- Hits 2569 1193 -1376
- Misses 164 186 +22
Files | Coverage Δ | |
---|---|---|
capstone-rs/examples/demo.rs | 90.38% <100.00%> (ø) |
|
capstone-rs/examples/parallel.rs | 95.23% <100.00%> (ø) |
|
capstone-rs/src/arch/evm.rs | 100.00% <ø> (+7.69%) |
:arrow_up: |
capstone-rs/src/arch/m680x.rs | 93.38% <100.00%> (-2.55%) |
:arrow_down: |
capstone-rs/src/arch/ppc.rs | 90.32% <100.00%> (-3.13%) |
:arrow_down: |
capstone-rs/src/arch/tms320c64x.rs | 89.91% <100.00%> (-7.52%) |
:arrow_down: |
capstone-rs/src/instruction.rs | 97.93% <100.00%> (+5.43%) |
:arrow_up: |
capstone-rs/src/lib.rs | 100.00% <ø> (ø) |
|
capstone-rs/src/arch/mips.rs | 67.74% <88.88%> (-28.26%) |
:arrow_down: |
capstone-rs/src/arch/mod.rs | 89.55% <75.00%> (-3.25%) |
:arrow_down: |
... and 7 more |
... and 4 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
On the other side, if we generate newtype enums for these two underlying C enums, we can easily convert an
Arm64Sysreg
into anArm64Reg
by manually implementing aFrom
:
Makes sense to me. If we want "handcrafted enums" that are more ergonomic, it would certainly be a separate PR.
I've updated some source code documentation to reflect the new architecture tag. However, I don't add a systematic overview or tutorial about the architecture tag as I'm not sure whether that's necessary. @tmfink Please have a look and see if the documentation can be further improved.
The examples and benches are already up-to-date with the latest API design. I'm checking whether the test coverage can be further improved and after that I belive this PR is ready to be formally reviewed and merged.
This PR resolves #118, which proposes compile-time target architecture tags. It contains a series of commits that implement the proposal. Problems may arise as I author these commits so I'm opening this PR in an early stage to make sure that we can agree on the specific designs in the end.
Here's a list of unresolved problems:
CapstoneBuilder
that buildsCapstone
instances whose target architectures are "really" unknown at compile-time. When we call functions likex86
, we have explicitly tell that the target architecture isx86
. We should implement a new builder that sets the target architecture via a runtimeArch
value.RegId
andInsnGroupId
(some architectures even theInsnId
) are simply defined as a bunch ofu32
values inside a mod rather than a dedicated enum (e.g. ARM, ARM64, x86, etc.). We’d better re-define them to be dedicated enums so that definingArchTag::RegId
andArchTag::InsnGroupId
is meaningful.Option
orResult
any more as they should always succeed given the exact target architecture. But forDynamicArchTag
these operations may still fail so we need to keep their return types as it is now. How do we resolve the conflict?~Here's a TODO list to make this PR fully ready: