capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
http://www.capstone-engine.org
7.26k stars 1.53k forks source link

Python API - missing instuction.reg_read/write values #2133

Open disconnect3d opened 1 year ago

disconnect3d commented 1 year ago

Hi,

It seems that the Python API is missing some details about instructions like .reg_read or .reg_write values:

In [20]: capstone.__version__
Out[20]: '5.0.0'

In [21]: from capstone import *

In [22]: md = Cs(CS_ARCH_X86, CS_MODE_64)

In [23]: data = asm('mov rbx, [rax+rcx*2+10]', arch='amd64', bits=64)  # From Pwntools

In [24]: md.detail = True

In [25]: instruction = md.disasm(data, 100)

In [26]: instruction = list(md.disasm(data, 100))[0]

In [27]: instruction.regs_write
Out[27]: []

In [28]: instruction.regs_read
Out[28]: []

While cstool shows those values ("Registers read", "Registers modified"):

# ./cstool/cstool -v
cstool for Capstone Disassembler, v5.0.0
Capstone build: x86=1 arm=1 arm64=1 mips=1 ppc=1 sparc=1 sysz=1 xcore=1 m68k=1 tms320c64x=1 m680x=1 evm=1 wasm=1 mos65xx=1 bpf=1 riscv=1 sh=1 tricore=1
root@pwndbg:~/capstone# ./cstool/cstool -d x64 488b5c480a
 0  48 8b 5c 48 0a                                   mov    rbx, qword ptr [rax + rcx*2 + 0xa]
    ID: 460 (mov)
    Prefix:0x00 0x00 0x00 0x00
    Opcode:0x8b 0x00 0x00 0x00
    rex: 0x48
    addr_size: 8
    modrm: 0x5c
    disp: 0xa
    sib: 0x48
        sib_base: rax
        sib_index: rcx
        sib_scale: 2
    op_count: 2
        operands[0].type: REG = rbx
        operands[0].size: 8
        operands[0].access: WRITE
        operands[1].type: MEM
            operands[1].mem.base: REG = rax
            operands[1].mem.index: REG = rcx
            operands[1].mem.scale: 2
            operands[1].mem.disp: 0xa
        operands[1].size: 8
        operands[1].access: READ
    Registers read: rax rcx
    Registers modified: rbx

PS: The cstool was compiled from tag 5.0-post1 (commit f2ffa75f787806e7fd986defb0dca0349ace6de2) while the Python API I use comes from pip install capstone==5.0.0.post1.

wallds commented 1 year ago
>>> import capstone
>>> capstone.__version__
'5.0.0'
>>> from capstone import *
>>> md = Cs(CS_ARCH_X86, CS_MODE_64)
>>> data = asm('mov rbx, [rax+rcx*2+10]', arch='amd64', bits=64)  # From Pwntools
>>> md.detail = True
>>> instruction = md.disasm(data, 100)
>>> instruction = list(md.disasm(data, 100))[0]
>>> instruction.regs_write
[]
>>> instruction.regs_read
[]
>>> regs_read, regs_write = instruction.regs_access()
>>> [md.reg_name(r) for r in regs_read]
['rax', 'rcx']
>>> [md.reg_name(r) for r in regs_write]
['rbx']

https://github.com/capstone-engine/capstone/blob/36c5eef4ed8457e454fc310fac3b5f836dd15aa7/arch/X86/X86Mapping.c#L2134-L2139

disconnect3d commented 1 year ago

@wallds Yeah, the instruction.regs_access() do return the read and write regs which can then be resolved through md.reg_name(reg_value) but yeah, .regs_write and .regs_read are empty for some reason.

I'm not sure what does the second listing shows: is this what the .regs_access() Python method calls here? If so, it must mean that the insn->detail->regs_{read,write} had the proper values but for some reason the Python API does not return those values?

wallds commented 1 year ago

Because insn->detail->regs_{read,write} contains only the list of implicitly accessed registers, such as:

mul rbx;                 .read=['rax'], .write=['rax', 'rdx', 'rflags']

cmp rax, rcx;            .read=[]     , .write=['rflags']

.regs_access will then proceed to parse the .operands based on this.

mul rbx;                 read=['rax', 'rbx'], write=['rax', 'rdx', 'rflags']

cmp rax, rcx;            read=['rax', 'rcx'], write=['rflags']
disconnect3d commented 1 year ago

Uh okay.

What is the purpose of this? Are there any use cases to get a list of only implicit read/written registers?

Perhaps it would be better to remove those attributes or to rename them to sth like regs_{read,write}_implicitly though I guess you may not want to make such an API break?

If not that, then this should be documented better (also in the C comments above).