capstone-rust / capstone-rs

high-level Capstone system bindings for Rust
217 stars 75 forks source link

regs_write() and regs_read() do not return complete results #84

Open bcantrill opened 4 years ago

bcantrill commented 4 years ago

The instruction detail provided via InsnDetail's regs_write() and regs_read() only return a subset of the registers written to or read from.

For example, running the C-based cstool:

$ cstool -d thumb "f0bd"
 0  f0 bd  pop  {r4, r5, r6, r7, pc}
    ID: 423 (pop)
    op_count: 5
        operands[0].type: REG = r4
        operands[0].access: WRITE
        operands[1].type: REG = r5
        operands[1].access: WRITE
        operands[2].type: REG = r6
        operands[2].access: WRITE
        operands[3].type: REG = r7
        operands[3].access: WRITE
        operands[4].type: REG = pc
        operands[4].access: WRITE
    Registers read: sp
    Registers modified: sp r4 r5 r6 r7 pc
    Groups: thumb thumb1only 

In contrast, running the capstone-rs-based cstool:

$ cstool -d --arch arm --mode thumb --code "f0bd"
      1000:  f0 bd                               pop     {r4, r5, r6, r7, pc}
             insn id:     423
             read regs:   sp
             write regs:  sp
             insn groups: thumb, thumb1only

The problem -- alluded to in #63 -- is that the C-based consumers are explicitly calling cs_regs_access() while the Rust-based consumers have no way of knowing that this must be called. There are a number of ways of fixing this, but it seems that calling insn_detail() should return an InsnDetail for which regs_write()/regs_read() return correct data -- and therefore that insn_detail() should explicitly make the call to cs_regs_access().

Here is a diff that implements this:

diff --git a/capstone-rs/src/capstone.rs b/capstone-rs/src/capstone.rs
index 9e77f9d..1ae313d 100644
--- a/capstone-rs/src/capstone.rs
+++ b/capstone-rs/src/capstone.rs
@@ -374,6 +374,53 @@ impl Capstone {
         } else if Self::is_diet() {
             Err(Error::IrrelevantDataInDiet)
         } else {
+            /*
+             * To assure that our returned InsnDetail's regs_*() accessors
+             * return the correct data, we need to call into cs_regs_access().
+             * This interface is unsafe even by C's own poor standards for
+             * itself: it takes no bounds and has essentially undefined
+             * behavior if the arrays passed in correspond to the (same-sized)
+             * arrays in the detail member of the cs_insn.  We therefore call
+             * into this interface with a scratch buffer on our stack and then
+             * manually copy it into the same structures in our underlying
+             * insn.  This could presumably be made less verbose, but given
+             * the degree of unsafety, we have erred on the side of caution...
+             */
+            unsafe {
+                let mut rbuffer: [u16; 32] = [0; 32];
+                let mut wbuffer: [u16; 32] = [0; 32];
+                let mut rcnt: u8 = 0;
+                let mut wcnt: u8 = 0;
+                let mut detail = insn.insn.detail;
+
+                let regs_read: *mut u16 = &mut rbuffer[0];
+                let regs_write: *mut u16 = &mut wbuffer[0];
+                let regs_read_count: *mut u8 = &mut rcnt;
+                let regs_write_count: *mut u8 = &mut wcnt;
+
+                let err = cs_regs_access(
+                    self.csh(), &insn.insn,
+                    regs_read, regs_read_count,
+                    regs_write, regs_write_count
+                );
+
+                if err != cs_err::CS_ERR_OK {
+                    return Err(err.into());
+                }
+
+                for i in 0..rcnt {
+                    (*detail).regs_read[i as usize] = rbuffer[i as usize];
+                }
+
+                (*detail).regs_read_count = rcnt;
+
+                for i in 0..wcnt {
+                    (*detail).regs_write[i as usize] = wbuffer[i as usize];
+                }
+
+                (*detail).regs_write_count = wcnt;
+            }
+
             Ok(unsafe { insn.detail(self.arch) })
         }
     }

Running with this diff, the capstone-rs-based cstool has the correct results:

$ cstool -d --arch arm --mode thumb --code "f0bd"
      1000:  f0 bd                               pop     {r4, r5, r6, r7, pc}
             insn id:     423
             read regs:   sp
             write regs:  sp, r4, r5, r6, r7, pc
             insn groups: thumb, thumb1only
tmfink commented 4 years ago

Thanks for filing the issue! As you pointed out, this will mostly be solved once we fix #68.

It seems like you probably have the right idea that we should always call cs_regs_access when in detail mode. However, cs_regs_access only seems to be supported on x86, arm, and arm64, so we don't want to error out if a call fails (just ignore).

vn-ki commented 2 years ago

Just hit this. Are you planning on fixing this? If you are busy, is the fix something I can help with?

tmfink commented 2 years ago

@vn-ki I did not make any plans to implement this myself. I would greatly appreciate help with this! Would you like to open a PR? I would be happy to help with any questions you have.

The reason I have not gotten to this is that the support is limited to only a few architectures, which potentially complicates the implementation. Some open questions:

1) Should regs_read()/regs_write() always call cs_regs_access? 2) How should we manage the lifetimes? Should we just copy into Rust-managed memory?

If you open a PR, please reference this issue (#84) and the original #63.

vn-ki commented 2 years ago

@tmfink I'll be happy to open a PR.

Some ideas:

  1. We can keep the current API (and approach), and add two new fields to capstone::InsnDetail corresponding to regs_read and regs_write. Depending on the architecture, we can return the one from cs_detail or the one from cs_regs_access. The initialization of these new two arrays would be at the time of construction when calling capstone::Capstone::insn_detail.
  2. We can do what the other bindings does and expose an extra Capstone::reg_access(&Insn) function which will call the cs_regs_access and return a InsnRegs(CapstoneRegs, CapstoneRegs) (lifetime shouldn't be an issue here since we are returning two fixed length arrays. Unless we want the regs to not live longer than the instruction). This is what the python binding seem to be doing, see details api and regs_access API.

I think approach 1 would be less confusing for the end user while approach 2 would be more inline with what upstream thinks is idiomatic (which might mean less headaches with future updates). Also with approach 2, maybe the user will have less of a headache if they want these regs to outlive the instruction?

Tell me if you think of any of these ones are good or if you had some other ideas?

tmfink commented 2 years ago

@vn-ki Thanks for the proposals. My thoughts after thinking for a few days:

I think I favor approach 1. We already bend over backwards to adapt to Capstone's "interesting" C API.

Some notes when implementing:

Tsn0w commented 2 years ago

Hi, is @vn-ki still working on this?