capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
http://www.capstone-engine.org
7.18k stars 1.52k forks source link

Xtensa Support #2380

Open imbillow opened 4 weeks ago

imbillow commented 4 weeks ago

Your checklist for this pull request

Detailed description

...

Test plan

...

Closing issues

...

XVilka commented 4 weeks ago

@imbillow what LLVM version you have used as a base for this PR? Mainstream LLVM 18? Or a fork from developers working on the Xtensa patches?

imbillow commented 4 weeks ago

@imbillow what LLVM version you have used as a base for this PR? Mainstream LLVM 18? Or a fork from developers working on the Xtensa patches?

I'm using the auto-sync branch of llvm-capstone, I think it's llvm-18, I'm not sure.

commit 5943ec6923d64e63b9645aa230cd9b86bd63a51b (HEAD -> auto-sync, origin/auto-sync, origin/HEAD)
Author: Rot127 <unisono@quyllur.org>
Date:   Tue Jun 4 03:29:20 2024 -0500

    Add Alpha and LoongArch to the CI tests.
Rot127 commented 4 weeks ago

It is LLVM 18. Please quickly diff the Target/Xtensa directories against each other. If there are vast differences, we can consider merging them earlier.

imbillow commented 3 weeks ago

A build based on the latest next branch still fails tests, but I'm seeing different results from the CI. I wonder if this is m2 macOS-specific behavior.

@Rot127

➤ git rev-parse --verify HEAD
60d5b7ec2f62e0115cb0833e6429fb9057f5867a
➤ cmake --build cmake-build-debug/
ninja: no work to do.
➤ cmake --install cmake-build-debug/ --prefix='.local'
...
➤ ./.local/bin/cstest -f tests/cs_details/issue.cs 
[+] TARGET: tests/cs_details/issue.cs
[==========] Testing issues: Running 51 test(s).
[ RUN      ] !# issue 0 ARM operand groups 0x90,0xe8,0x0e,0x00 == ldm.w r0, {r1, r2, r3} ;
[       OK ] !# issue 0 ARM operand groups 0x90,0xe8,0x0e,0x00 == ldm.w r0, {r1, r2, r3} ;
[ RUN      ] !# issue 0 ARM operand groups 0x0e,0xc8 == ldm r0!, {r1, r2, r3} ;
[       OK ] !# issue 0 ARM operand groups 0x0e,0xc8 == ldm r0!, {r1, r2, r3} ;
[ RUN      ] !# issue 0 ARM operand groups 0x00,0x2a,0xf7,0xee == vmov.f32 s5, #1.000000e+00 ;
[       OK ] !# issue 0 ARM operand groups 0x00,0x2a,0xf7,0xee == vmov.f32 s5, #1.000000e+00 ;
[ RUN      ] !# issue 0 ARM operand groups 0x0f,0x00,0x71,0xe3 == cmn r1, #15 ;
[       OK ] !# issue 0 ARM operand groups 0x0f,0x00,0x71,0xe3 == cmn r1, #15 ;
[ RUN      ] !# issue 0 ARM operand groups 0x03,0x20,0xb0,0xe1 == movs r2, r3 ;
[       OK ] !# issue 0 ARM operand groups 0x03,0x20,0xb0,0xe1 == movs r2, r3 ;
[ RUN      ] !# issue 0 ARM operand groups 0xfd,0x8f == ldrh r5, [r7, #62] ;
[       OK ] !# issue 0 ARM operand groups 0xfd,0x8f == ldrh r5, [r7, #62] ;
[ RUN      ] !# issue 0 ARM operand groups 0x61,0xb6 == cpsie f ;
[       OK ] !# issue 0 ARM operand groups 0x61,0xb6 == cpsie f ;
[ RUN      ] !# issue 0 ARM operand groups 0x18,0xf8,0x03,0x1e == ldrbt r1, [r8, #3] ;
[       OK ] !# issue 0 ARM operand groups 0x18,0xf8,0x03,0x1e == ldrbt r1, [r8, #3] ;
[ RUN      ] !# issue 0 ARM operand groups 0xb0,0xf8,0x01,0xf1 == pldw [r0, #257] ;
[       OK ] !# issue 0 ARM operand groups 0xb0,0xf8,0x01,0xf1 == pldw [r0, #257] ;
[ RUN      ] !# issue 0 ARM operand groups 0xd3,0xe8,0x08,0xf0 == tbb [r3, r8] ;
[       OK ] !# issue 0 ARM operand groups 0xd3,0xe8,0x08,0xf0 == tbb [r3, r8] ;
[ RUN      ] !# issue 0 ARM operand groups 0xd3,0xe8,0x18,0xf0 == tbh [r3, r8, lsl #1] ;
[       OK ] !# issue 0 ARM operand groups 0xd3,0xe8,0x18,0xf0 == tbh [r3, r8, lsl #1] ;
[ RUN      ] !# issue 0 ARM operand groups 0xaf,0xf3,0x43,0x85 == cpsie i, #3 ;
[       OK ] !# issue 0 ARM operand groups 0xaf,0xf3,0x43,0x85 == cpsie i, #3 ;
[ RUN      ] !# issue 0 ARM operand groups 0xbf,0xf3,0x6f,0x8f == isb sy ;
[       OK ] !# issue 0 ARM operand groups 0xbf,0xf3,0x6f,0x8f == isb sy ;
[ RUN      ] !# issue 0 ARM operand groups 0x59,0xea,0x7b,0x89 == csel r9, r9, r11, vc ;
[       OK ] !# issue 0 ARM operand groups 0x59,0xea,0x7b,0x89 == csel r9, r9, r11, vc ;
[ RUN      ] !# issue 0 ARM operand groups 0xbf,0xf3,0x56,0x8f == dmb nshst ;
[       OK ] !# issue 0 ARM operand groups 0xbf,0xf3,0x56,0x8f == dmb nshst ;
[ RUN      ] !# issue 0 ARM operand groups 0x31,0xfa,0x02,0xf2 == lsrs.w r2, r1, r2 ;
[       OK ] !# issue 0 ARM operand groups 0x31,0xfa,0x02,0xf2 == lsrs.w r2, r1, r2 ;
[ RUN      ] !# issue 0 ARM operand groups 0x5f,0xf0,0x0c,0x01 == movseq.w r1, #12 ;
[       OK ] !# issue 0 ARM operand groups 0x5f,0xf0,0x0c,0x01 == movseq.w r1, #12 ;
[ RUN      ] !# issue 0 ARM operand groups 0x52,0xe8,0x01,0x1f == ldrex r1, [r2, #4] ;
[       OK ] !# issue 0 ARM operand groups 0x52,0xe8,0x01,0x1f == ldrex r1, [r2, #4] ;
[ RUN      ] !# issue 0 ARM operand groups 0xdf,0xec,0x1d,0x1a == vscclrmhi {s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14, s15, s16, s17, s18, s19, s20, s21, s22, s23, s24, s25, s26, s27, s28, s29, s30, s31, vpr} ;
[       OK ] !# issue 0 ARM operand groups 0xdf,0xec,0x1d,0x1a == vscclrmhi {s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14, s15, s16, s17, s18, s19, s20, s21, s22, s23, s24, s25, s26, s27, s28, s29, s30, s31, vpr} ;
[ RUN      ] !# issue 0 ARM operand groups 0x9f,0xec,0x06,0x5b == vscclrm {d5, d6, d7, vpr} ;
[       OK ] !# issue 0 ARM operand groups 0x9f,0xec,0x06,0x5b == vscclrm {d5, d6, d7, vpr} ;
[ RUN      ] !# issue 0 ARM operand groups 0xbc,0xfd,0x7f,0xaf == vldrh.u32 q5, [r4, #254]! ;
[       OK ] !# issue 0 ARM operand groups 0xbc,0xfd,0x7f,0xaf == vldrh.u32 q5, [r4, #254]! ;
[ RUN      ] !# issue 0 ARM operand groups 0x80,0xfc,0x80,0x1e == vst20.16 {q0, q1}, [r0] ;
[       OK ] !# issue 0 ARM operand groups 0x80,0xfc,0x80,0x1e == vst20.16 {q0, q1}, [r0] ;
[ RUN      ] !# issue 0 ARM operand groups 0x98,0xfc,0x4e,0x08 == vcadd.f32 q0, q4, q7, #90 ;
[       OK ] !# issue 0 ARM operand groups 0x98,0xfc,0x4e,0x08 == vcadd.f32 q0, q4, q7, #90 ;
[ RUN      ] !# issue 0 ARM operand groups 0x94,0xfd,0x46,0x48 == vcadd.f32 q2, q2, q3, #270 ;
[       OK ] !# issue 0 ARM operand groups 0x94,0xfd,0x46,0x48 == vcadd.f32 q2, q2, q3, #270 ;
[ RUN      ] !# issue 0 ARM operand groups 0x9d,0xec,0x82,0x6e == vldrb.s16 q3, [sp, q1] ;
[       OK ] !# issue 0 ARM operand groups 0x9d,0xec,0x82,0x6e == vldrb.s16 q3, [sp, q1] ;
[ RUN      ] !# issue 0 ARM operand groups 0x90,0xec,0x12,0x6f == vldrh.s32 q3, [r0, q1] ;
[       OK ] !# issue 0 ARM operand groups 0x90,0xec,0x12,0x6f == vldrh.s32 q3, [r0, q1] ;
[ RUN      ] !# issue 0 ARM operand groups 0x5f,0xea,0x2d,0x83 == sqrshrl lr, r3, #64, r8 ;
[       OK ] !# issue 0 ARM operand groups 0x5f,0xea,0x2d,0x83 == sqrshrl lr, r3, #64, r8 ;
[ RUN      ] !# issue 0 ARM operand groups 0x82,0xfd,0x21,0xff == vstrd.64 q7, [q1, #264] ;
[       OK ] !# issue 0 ARM operand groups 0x82,0xfd,0x21,0xff == vstrd.64 q7, [q1, #264] ;
[ RUN      ] !# issue 0 ARM operand groups 0x06,0x16,0x72,0xe6 == ldrbt r1, [r2], -r6, lsl #12 ;
[       OK ] !# issue 0 ARM operand groups 0x06,0x16,0x72,0xe6 == ldrbt r1, [r2], -r6, lsl #12 ;
[ RUN      ] !# issue 0 ARM operand groups 0xf6,0x50,0x33,0xe1 == ldrsh r5, [r3, -r6]! ;
[       OK ] !# issue 0 ARM operand groups 0xf6,0x50,0x33,0xe1 == ldrsh r5, [r3, -r6]! ;
[ RUN      ] !# issue 0 ARM operand groups 0x1e,0x19,0x7a,0xfd == ldc2l p9, c1, [r10, #-120]! ;
[       OK ] !# issue 0 ARM operand groups 0x1e,0x19,0x7a,0xfd == ldc2l p9, c1, [r10, #-120]! ;
[ RUN      ] !# issue 0 ARM operand groups 0x12,0x31,0x7c,0xfc == ldc2l p1, c3, [r12], #-72 ;
[       OK ] !# issue 0 ARM operand groups 0x12,0x31,0x7c,0xfc == ldc2l p1, c3, [r12], #-72 ;
[ RUN      ] !# issue 0 ARM operand groups 0xa4,0xf9,0x6d,0x0e == vld3.16 {d0[], d2[], d4[]}, [r4]! ;
[  ERROR   ] --- 0xa4,0xf9,0x6d,0x0e --- "operands[3].mem.index: REG = r4" not in "vld3.16 {d0[], d2[], d4[]}, [r4]! ; op_count: 4 ; operands[0].type: REG = d0 ; operands[0].access: WRITE ; operands[1].type: REG = d2 ; operands[1].access: WRITE ; operands[2].type: REG = d4 ; operands[2].access: WRITE ; operands[3].type: MEM ; operands[3].mem.base: REG = r4 ; operands[3].mem.scale: 0 ; operands[3].access: READ | WRITE ; Write-back: True ; Vector-size: 16 ; Registers read: r4 ; Registers modified: r4 d0 d2 d4 ; Groups: HasNEON ;"
[  ERROR   ] --- [   LINE   ] --- /Users/aya/Source/work/cs/suite/cstest/src/capstone_test.c:287: error: Failure!
[  FAILED  ] !# issue 0 ARM operand groups 0xa4,0xf9,0x6d,0x0e == vld3.16 {d0[], d2[], d4[]}, [r4]! ;
[ RUN      ] !# issue 0 ARM operand groups 0x0d,0x50,0x66,0xe4 == strbt r5, [r6], #-13 ;
[       OK ] !# issue 0 ARM operand groups 0x0d,0x50,0x66,0xe4 == strbt r5, [r6], #-13 ;
[ RUN      ] !# issue 0 ARM operand groups 0x00,0x10,0x4f,0xe2 == sub r1, pc, #0 ;
[       OK ] !# issue 0 ARM operand groups 0x00,0x10,0x4f,0xe2 == sub r1, pc, #0 ;
[ RUN      ] !# issue 0 ARM operand groups 0x9f,0x51,0xd3,0xe7 == bfc r5, #3, #17 ;
[       OK ] !# issue 0 ARM operand groups 0x9f,0x51,0xd3,0xe7 == bfc r5, #3, #17 ;
[ RUN      ] !# issue 0 ARM operand groups 0xd8,0xe8,0xff,0x67 == ldaexd r6, r7, [r8] ;
[       OK ] !# issue 0 ARM operand groups 0xd8,0xe8,0xff,0x67 == ldaexd r6, r7, [r8] ;
[ RUN      ] !# issue 0 ARM operand groups 0x30,0x0f,0xa6,0xe6 == ssat16 r0, #7, r0 ;
[       OK ] !# issue 0 ARM operand groups 0x30,0x0f,0xa6,0xe6 == ssat16 r0, #7, r0 ;
[ RUN      ] !# issue 0 ARM operand groups 0x9a,0x8f,0xa0,0xe6 == ssat r8, #1, r10, lsl #31 ;
[       OK ] !# issue 0 ARM operand groups 0x9a,0x8f,0xa0,0xe6 == ssat r8, #1, r10, lsl #31 ;
[ RUN      ] !# issue 0 ARM operand groups 0x40,0x1b,0xf5,0xee == vcmp.f64 d17, #0 ;
[       OK ] !# issue 0 ARM operand groups 0x40,0x1b,0xf5,0xee == vcmp.f64 d17, #0 ;
[ RUN      ] !# issue 0 ARM operand groups 0x05,0xf0,0x2f,0xe3 == msr CPSR_fsxc, #5 ;
[       OK ] !# issue 0 ARM operand groups 0x05,0xf0,0x2f,0xe3 == msr CPSR_fsxc, #5 ;
[ RUN      ] !# issue 0 ARM operand groups 0xa4,0xf9,0xed,0x0b == vld4.32 {d0[1], d2[1], d4[1], d6[1]}, [r4:128]! ;
[  ERROR   ] --- 0xa4,0xf9,0xed,0x0b --- "operands[4].mem.index: REG = r4" not in "vld4.32 {d0[1], d2[1], d4[1], d6[1]}, [r4:0x80]! ; op_count: 5 ; operands[0].type: REG = d0 ; operands[0].neon_lane = 1 ; operands[0].access: READ | WRITE ; operands[1].type: REG = d2 ; operands[1].neon_lane = 1 ; operands[1].access: READ | WRITE ; operands[2].type: REG = d4 ; operands[2].neon_lane = 1 ; operands[2].access: READ | WRITE ; operands[3].type: REG = d6 ; operands[3].neon_lane = 1 ; operands[3].access: READ | WRITE ; operands[4].type: MEM ; operands[4].mem.base: REG = r4 ; operands[4].mem.scale: 0 ; operands[4].access: READ | WRITE ; Write-back: True ; Vector-size: 32 ; Registers read: d0 d2 d4 d6 r4 ; Registers modified: r4 d0 d2 d4 d6 ; Groups: HasNEON ;"
[  ERROR   ] --- [   LINE   ] --- /Users/aya/Source/work/cs/suite/cstest/src/capstone_test.c:287: error: Failure!
[  FAILED  ] !# issue 0 ARM operand groups 0xa4,0xf9,0xed,0x0b == vld4.32 {d0[1], d2[1], d4[1], d6[1]}, [r4:128]! ;
[ RUN      ] !# issue 0 ARM operand groups 0x42,0x03,0xb0,0xf3 == aesd.8 q0, q1 ;
[       OK ] !# issue 0 ARM operand groups 0x42,0x03,0xb0,0xf3 == aesd.8 q0, q1 ;
[ RUN      ] !# issue 0 ARM operand groups 0x11,0x57,0x54,0xfc == mrrc2 p7, #1, r5, r4, c1 ;
[       OK ] !# issue 0 ARM operand groups 0x11,0x57,0x54,0xfc == mrrc2 p7, #1, r5, r4, c1 ;
[ RUN      ] !# issue 0 ARM operand groups 0xd3,0x2f,0x82,0xe6 == pkhtb r2, r2, r3, asr #31 ;
[       OK ] !# issue 0 ARM operand groups 0xd3,0x2f,0x82,0xe6 == pkhtb r2, r2, r3, asr #31 ;
[ RUN      ] !# issue 0 ARM operand groups 0x93,0x27,0x82,0xe6 == pkhbt r2, r2, r3, lsl #15 ;
[       OK ] !# issue 0 ARM operand groups 0x93,0x27,0x82,0xe6 == pkhbt r2, r2, r3, lsl #15 ;
[ RUN      ] !# issue 0 ARM operand groups 0xb4,0x10,0xf0,0xe0 == ldrht r1, [r0], #4 ;
[       OK ] !# issue 0 ARM operand groups 0xb4,0x10,0xf0,0xe0 == ldrht r1, [r0], #4 ;
[ RUN      ] !# issue 0 ARM operand groups 0x2f,0xfa,0xa1,0xf3 == sxtb16 r3, r1, ror #16 ;
[       OK ] !# issue 0 ARM operand groups 0x2f,0xfa,0xa1,0xf3 == sxtb16 r3, r1, ror #16 ;
[ RUN      ] !# issue 0 ARM operand groups 0x00,0x02,0x01,0xf1 == setend be ;
[       OK ] !# issue 0 ARM operand groups 0x00,0x02,0x01,0xf1 == setend be ;
[ RUN      ] !# issue 0 ARM operand groups 0xd0,0xe8,0xaf,0x0f == lda r0, [r0]
[       OK ] !# issue 0 ARM operand groups 0xd0,0xe8,0xaf,0x0f == lda r0, [r0]
[ RUN      ] !# issue 0 ARM operand groups 0xef,0xf3,0x11,0x85 == ldrhi pc, [r1, #-0x3ef]
[  ERROR   ] --- 0xef,0xf3,0x11,0x85 --- "Groups: IsARM" not in "ldrhi pc, [r1, #-0x3ef] ; op_count: 2 ; operands[0].type: REG = r15 ; operands[0].access: WRITE ; operands[1].type: MEM ; operands[1].mem.base: REG = r1 ; operands[1].mem.scale: 0 ; operands[1].mem.disp: 0x3ef ; operands[1].access: READ ; Subtracted: True ; Code condition: 8 ; Registers read: cpsr r1 ; Registers modified: r15 ; Groups: IsARM jump ;"
[  ERROR   ] --- [   LINE   ] --- /Users/aya/Source/work/cs/suite/cstest/src/capstone_test.c:287: error: Failure!
[  FAILED  ] !# issue 0 ARM operand groups 0xef,0xf3,0x11,0x85 == ldrhi pc, [r1, #-0x3ef]
[==========] Testing issues: 51 test(s) run.
[  PASSED  ] 48 test(s).
[  FAILED  ] Testing issues: 3 test(s), listed below:
[  FAILED  ] !# issue 0 ARM operand groups 0xa4,0xf9,0x6d,0x0e == vld3.16 {d0[], d2[], d4[]}, [r4]! ;
[  FAILED  ] !# issue 0 ARM operand groups 0xa4,0xf9,0xed,0x0b == vld4.32 {d0[1], d2[1], d4[1], d6[1]}, [r4:128]! ;
[  FAILED  ] !# issue 0 ARM operand groups 0xef,0xf3,0x11,0x85 == ldrhi pc, [r1, #-0x3ef]

 3 FAILED TEST(S)
[+] DONE: tests/cs_details/issue.cs
[!] Noted:
[  ERROR   ] --- "<capstone result>" != "<user result>"
Rot127 commented 3 weeks ago

The tests/cs_details/issue.cs are not tested in the current next. Added them though in the in ASAN PR. But they are obsolete with the AArch64 PR anyways. There they are changed again.

XVilka commented 3 weeks ago

@imbillow note also, Xtensa could be both little endian and big endian.

https://0x04.net/~mwk/doc/xtensa.pdf

aquynh commented 3 weeks ago

thank you for this amazing effort!

few quick comments:

imbillow commented 2 weeks ago

thank you for this amazing effort!

few quick comments:

  • please add this new arch to README
  • please add a test file into tests/
  • please use capitalized letters for all enums, for example Xtensa_REG_SAR should be XTENSA_REG_SAR

The names of these enum types are generated from llvm-capstone via the auto-sync script, and I think it's best not to change them for case.

imbillow commented 2 weeks ago

@imbillow note also, Xtensa could be both little endian and big endian.

0x04.net/~mwk/doc/xtensa.pdf

https://github.com/llvm/llvm-project/blob/5021e6dd548323e1169be3d466d440009e6d1f8e/llvm/lib/Target/Xtensa/Disassembler/XtensaDisassembler.cpp#L256

The latest version of LLVM also does not support big-endian mode, probably because LLVM's xtensa support is still experimental.

Although there is a third party repository that supports xtensa https://github.com/espressif/llvm-project

XVilka commented 2 weeks ago

@imbillow note also, Xtensa could be both little endian and big endian. 0x04.net/~mwk/doc/xtensa.pdf

https://github.com/llvm/llvm-project/blob/5021e6dd548323e1169be3d466d440009e6d1f8e/llvm/lib/Target/Xtensa/Disassembler/XtensaDisassembler.cpp#L256

The latest version of LLVM also does not support big-endian mode, probably because LLVM's xtensa support is still experimental.

Although there is a third party repository that supports xtensa https://github.com/espressif/llvm-project

At least please mark the Xtensa mode as LE then, until BE is added, so that the mode enum change will not be necessary in the future.

Rot127 commented 2 weeks ago

@imbillow Enums get fixed with: https://github.com/capstone-engine/llvm-capstone/pull/50

aquynh commented 2 weeks ago

thank you for this amazing effort! few quick comments:

  • please add this new arch to README
  • please add a test file into tests/
  • please use capitalized letters for all enums, for example Xtensa_REG_SAR should be XTENSA_REG_SAR

The names of these enum types are generated from llvm-capstone via the auto-sync script, and I think it's best not to change them for case.

please change the case, as all other arcs are doing that.

the auto-sync code should update to have this, too.

Rot127 commented 1 week ago

@imbillow You can use the latest main branch of the llvm-capstone. It will generate the enums in capital letters.

imbillow commented 1 week ago

@aquynh @kabeor @Rot127 this PR is ready to be merged, please check again.

Rot127 commented 1 week ago

Please check the clang-tidy warning:

 /home/runner/work/capstone/capstone/arch/Xtensa/XtensaDisassembler.c:75:29: note: The left operand of '==' is a garbage value due to array index out of bounds
                if (SRDecoderTable[i + 1] == RegNo) {
                    ~~~~~~~~~~~~~~~~~~~~~ ^

This can be out of bounds