capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
http://www.capstone-engine.org
7.61k stars 1.56k forks source link

aarch64: incorrect register in regs_access() for bl instruction #2234

Closed find0x90 closed 10 months ago

find0x90 commented 10 months ago

The regs_access() function returns 'sp' as a read register for the bl instruction.

Below is a small script that reproduces the issue between version 4.0.2 and the most recent commit as of this comment.

#! /usr/bin/env python3
# cs_test.py

from capstone import *

try:
    md = Cs(CS_ARCH_ARM64, CS_MODE_ARM | CS_MODE_LITTLE_ENDIAN)
except:
    md = Cs(CS_ARCH_AARCH64, CS_MODE_ARM | CS_MODE_LITTLE_ENDIAN)
md.detail = True

instruction_bytes = b"\xec\x6a\x01\x95"

inst = list(md.disasm(instruction_bytes, offset=0x0, count=1))[0]

print(inst)

regs_read, regs_written = inst.regs_access()
regs_read = [inst.reg_name(r) for r in regs_read]
regs_written = [inst.reg_name(r) for r in regs_written]

print(regs_read, regs_written)

4.0.2:

$ ./test.py
<CsInsn 0x0 [ec6a0195]: bl #0x405abb0>
[] ['x30']

next branch b9c260e9:

$ ./test.py
<CsInsn 0x0 [ec6a0195]: bl 0x405abb0>
['sp'] ['x30']
Rot127 commented 10 months ago

It is incorrectly defined in LLVM:

let isCall = 1, Defs = [LR], Uses = [SP] in {
    def BL : CallImm<1, "bl", [(AArch64call tglobaladdr:$addr)]>;
} // isCall
find0x90 commented 10 months ago

@Rot127 can you help me understand the change you made? I'd like to be able to contribute fixes in the future if I find more issues.

I understand removing Uses = [SP] but why change isCall to isBranch? Aren't bl and blr the procedure call instructions for AArch64?

find0x90 commented 10 months ago

Actually, I just looked at blr in the llvm/lib/Target/AArch64/AArch64InstrInfo.td file. It is listed as isCall and also has Uses = [SP]. Is that wrong as well?

Rot127 commented 10 months ago

but why change isCall to isBranch?

You are right. I did the changes in a rush and was sloppy. BL and BLR are considered calls. Thanks for pointing it out!

and also has Uses = [SP]. Is that wrong as well?

Uses = [SP] is wrong. I can't see any mentioning of SP usage in the ISA.

can you help me understand the change you made? I'd like to be able to contribute fixes in the future if I find more issues.

The changes in our LLVM repo are the definitions of the architecture. From those definitions we generate our disassembler logic.

If we discover a flaw in the definition, we need to change the it in the td files first and generate our decoding tables again. For details see the documentation. Please let me know which parts of the docs are not clear or badly written (if any). Didn't get feedback to them yet and I had certainly blind spots while writing it.

Rot127 commented 10 months ago

The TLDR is:

Though, if you are can't spend the time to get into the quirks with updating, better wait until v6 is released. The update system is new and still have unpolished corners which can be confusing.

find0x90 commented 10 months ago

Cool, if I spot any other errors I'll report them and also give this process a shot to see if I can contribute. Thanks for all the hard work on this! The recent updates to Capstone are very much appreciated.

find0x90 commented 10 months ago

Just tested and it's fixed for me, thanks!