capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
http://www.capstone-engine.org
7.18k stars 1.52k forks source link

ARM64_REG_Vx vs. ARM64_REG_Qx #2363

Open junghee opened 1 month ago

junghee commented 1 month ago

For instruction mov w20, v1.s[0] (0x0e043c34),

I get the correct information with the version 5.0.0:

# cstool -d arm64be 0e043c34
0  0e 04 3c 34  mov    w20, v1.s[0]
        ID: 488 (mov)
        op_count: 2
                operands[0].type: REG = w20
                operands[0].access: WRITE
                operands[1].type: REG = v1
                operands[1].access: READ
                        Vector Arrangement Specifier: 0xb
                        Vector Index: 0
        Registers read: v1
        Registers modified: w20
        Groups: neon

Whereas, with the next branch, I get q1 instead of v1 for the second operand:

# ./cstool/cstool -d aarch64be 0e043c34
0  0e 04 3c 34  mov    w20, v1.s[0]
        ID: 1232 (umov)
        Is alias: 1349 (mov) with ALIAS operand set
        op_count: 2
                operands[0].type: REG = w20
                operands[0].access: WRITE
                operands[1].type: REG = q1
                operands[1].access: READ
                        Vector Arrangement Specifier: 0x20
                        Vector Index: 0
        Registers read: q1
        Registers modified: w20
        Groups: HasNEONorSME

Although the names v1 and q1 refer to the same register, their interpretations are different (i.e., q1 as a single 128-bit quantity while v0 as a vector of something).

Rot127 commented 1 month ago

Looked into it and wouldn't restore the version of v5.

As you already said, the v1 and q1 refer to the same register. Due to the Vector Arrangement Specifier and the Vector index, the offset in the register can be determined definitely. So semantically we shouldn't have a problem.

Now, simply printing v1 instead of q1 is not easily possible, because the API doesn't allow it. The cs_reg_name() function only takes register identifiers. And v1 and q1 share the same identifier: AArch64_REG_Q1. Hence it cannot decide which name to print.

Now, we could duplicate the register identifiers for Vn registers (as in v5), but this complicates usage of the API again. Because users have to check for AArch64_REG_V1 and AArch64_REG_Q1 to check, if they deal with the same register.

Second alternative is some complicated handling of this exception in cs_reg_name(). But I don't think this is worth it, for such a small syntactical difference, which is semantically correct. (Also I am not sure it would actually work).

I added a flag in aarch64_op which indicates if the register is a Vn register. And for cstool I simply added the (vreg) postfix.

cstool -d aarch64be 0e043c34
 0  0e 04 3c 34  mov    w20, v1.s[0]
    ID: 1285 (umov)
    Is alias: 1429 (mov) with ALIAS operand set
    op_count: 2
        operands[0].type: REG = w20
        operands[0].access: WRITE
        operands[1].type: REG = q1 (vreg)
        operands[1].access: READ
            Vector Arrangement Specifier: 0x20
            Vector Index: 0
    Registers read: q1
    Registers modified: w20
    Groups: HasNEONorSME 

What do you think? cc @FinnWilkinson.