capstone-rust / capstone-rs

high-level Capstone system bindings for Rust
213 stars 75 forks source link

[WIP] Compile-time Target Architecture Tags #143

Open Lancern opened 1 year ago

Lancern commented 1 year ago

This PR resolves #118, which proposes compile-time target architecture tags. It contains a series of commits that implement the proposal. Problems may arise as I author these commits so I'm opening this PR in an early stage to make sure that we can agree on the specific designs in the end.

Here's a list of unresolved problems:

Here's a TODO list to make this PR fully ready:

tmfink commented 11 months ago

Currently, for many supported architectures, their RegId and InsnGroupId (some architectures even the InsnId) are simply defined as a bunch of u32 values inside a mod rather than a dedicated enum (e.g. ARM, ARM64, x86, etc.). We’d better re-define them to be dedicated enums so that defining ArchTag::RegId and ArchTag::InsnGroupId is meaningful.

I like your change which uses newtype_enum instead of the constified enum module! It's already a big improvement.

Bindgen didn't supported them when I first did this work. In fact, I contributed the "constified enum module" feature to bindgen (https://github.com/rust-lang/rust-bindgen/pull/741) to avoid the less ergonomic constified enum.

However, I think we may be able to do even better (although it might be more annoying). Ideally, we would expose actual Rust enums in the high-level bindings. That way, users could take advantage of the exhaustive pattern matching with match. The Capstone C code makes sure that 0 is always an "invalid" value such as ARM_SYSREG_INVALID. When converting from the C value to the Rust value, we could map any invalid value to an "invalid" variant of the enum OR always return an Option<T> when converting and return None if we get an invalid value.

// Have internal "invalid" variant
enum ArmReg {
    Invalid = 0,
    APSR,
    APSR_NZCV,
    // ...
}
// No invalid variant, so converting gives `Option<ArmReg>`
enum ArmReg {
    APSR = 1,
    APSR_NZCV,
    // ...
}

You don't need to implement this "Rust enum" change as a part of this PR--I just thought I'd let you know what I was thinking.

Lancern commented 11 months ago

Ideally, we would expose actual Rust enums in the high-level bindings.

Actually, I tried this at the beginning when writing the second commit. However, I quickly ran into problems.

Take ARM64 registers as an example. The problem is that there are actually 2 underlying C enums that both represent valid ARM64 reg IDs: arm64_reg and arm64_sysreg. On the Rust side, each architecture can only have one Rust enum that represents an architecture register. Unfortunately, bindgen cannot merge two C enums into one big Rust enum. Also converting Arm64Sysreg into Arm64Reg won't work because the underlying value ranges of Arm64Sysreg and Arm64Reg are different. Thus the problem cannot be solved without considerable manual efforts if we generate Rust enums solely.

On the other side, if we generate newtype enums for these two underlying C enums, we can easily convert an Arm64Sysreg into an Arm64Reg by manually implementing a From:

impl From<Arm64Sysreg> for Arm64Reg {
    fn from(sysreg: Arm64Sysreg): Self {
        Self(sysreg.0)
    }
}
codecov[bot] commented 10 months ago

Codecov Report

Merging #143 (7d8ab8c) into master (74ccb09) will decrease coverage by 7.49%. Report is 1 commits behind head on master. The diff coverage is 91.01%.

:exclamation: Current head 7d8ab8c differs from pull request most recent head 5cd9258. Consider uploading reports for the commit 5cd9258 to get more accurate results

@@            Coverage Diff             @@
##           master     #143      +/-   ##
==========================================
- Coverage   93.99%   86.51%   -7.49%     
==========================================
  Files          22       18       -4     
  Lines        2733     1379    -1354     
  Branches     2687     1377    -1310     
==========================================
- Hits         2569     1193    -1376     
- Misses        164      186      +22     
Files Coverage Δ
capstone-rs/examples/demo.rs 90.38% <100.00%> (ø)
capstone-rs/examples/parallel.rs 95.23% <100.00%> (ø)
capstone-rs/src/arch/evm.rs 100.00% <ø> (+7.69%) :arrow_up:
capstone-rs/src/arch/m680x.rs 93.38% <100.00%> (-2.55%) :arrow_down:
capstone-rs/src/arch/ppc.rs 90.32% <100.00%> (-3.13%) :arrow_down:
capstone-rs/src/arch/tms320c64x.rs 89.91% <100.00%> (-7.52%) :arrow_down:
capstone-rs/src/instruction.rs 97.93% <100.00%> (+5.43%) :arrow_up:
capstone-rs/src/lib.rs 100.00% <ø> (ø)
capstone-rs/src/arch/mips.rs 67.74% <88.88%> (-28.26%) :arrow_down:
capstone-rs/src/arch/mod.rs 89.55% <75.00%> (-3.25%) :arrow_down:
... and 7 more

... and 4 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

tmfink commented 9 months ago

On the other side, if we generate newtype enums for these two underlying C enums, we can easily convert an Arm64Sysreg into an Arm64Reg by manually implementing a From:

Makes sense to me. If we want "handcrafted enums" that are more ergonomic, it would certainly be a separate PR.

Lancern commented 9 months ago

I've updated some source code documentation to reflect the new architecture tag. However, I don't add a systematic overview or tutorial about the architecture tag as I'm not sure whether that's necessary. @tmfink Please have a look and see if the documentation can be further improved.

The examples and benches are already up-to-date with the latest API design. I'm checking whether the test coverage can be further improved and after that I belive this PR is ready to be formally reviewed and merged.