capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
http://www.capstone-engine.org
7.18k stars 1.52k forks source link

Initial auto-sync LoongArch support #2349

Closed jiegec closed 1 week ago

jiegec commented 2 months ago

Your checklist for this pull request

Detailed description

Add loongarch support to auto-sync. Co-authored by @FurryAcetylCoA.

See https://github.com/capstone-engine/llvm-capstone/pull/47 for llvm changes. The generated code have small changes after generated via:

./src/autosync/ASUpdater.py -a LoongArch -s IncGen Translate

Test plan

...

Closing issues

...

jiegec commented 2 months ago

Not finished yet, but hope to see some suggestions from @Rot127. The pr can already disassemble some instructions:

$ test_loongarch
****************
Platform: loongarch32
Code:0x0c 0x00 0x08 0x14 0x8c 0xfd 0xbf 0x02
Disasm:
0x1000: lu12i.w $t0, 0x4000
        op_count: 2
                operands[0].type: REG = t0
                operands[1].type: IMM = 0x4000

0x1004: addi.w  $t0, $t0, -1
        op_count: 3
                operands[0].type: REG = t0
                operands[1].type: REG = t0
                operands[2].type: IMM = 0xffffffffffffffff

0x1008:

****************
Platform: loongarch64
Code:0x80 0x80 0x00 0x40 0x63 0x80 0xff 0x02 0x78 0x20 0xc0 0x29 0x00 0x84 0x00 0x01 0x00 0xa4 0x14 0x01
Disasm:
0x1000: beqz    $a0, 0x80
        op_count: 2
                operands[0].type: REG = a0
                operands[1].type: IMM = 0x80

0x1004: addi.d  $sp, $sp, -0x20
        op_count: 3
                operands[0].type: REG = sp
                operands[1].type: REG = sp
                operands[2].type: IMM = 0xffffffffffffffe0

0x1008: st.d    $s1, $sp, 8
        op_count: 3
                operands[0].type: REG = (null)
                operands[1].type: REG = sp
                operands[2].type: IMM = 0x8

0x100c: fadd.s  $fa0, $fa0, $fa1
        op_count: 3
                operands[0].type: REG = fa0
                operands[1].type: REG = fa0
                operands[2].type: REG = fa1

0x1010: movgr2fr.w      $fa0, $zero
        op_count: 2
                operands[0].type: REG = fa0
                operands[1].type: REG = zero

0x1014:

The memory operand need to be handled, and more tests are required.

Rot127 commented 2 months ago

Regarding the CI:

jiegec commented 2 months ago

Features added:

However, in LoongArch assembly, memory operands are not special i.e. they are normal register/immediate operands. I am unsure how to create MEM operands.

Rot127 commented 2 months ago

However, in LoongArch assembly, memory operands are not special i.e. they are normal register/immediate operands. I am unsure how to create MEM operands.

Indeed a little tricky. I checked the td files and the ISA quickly and it seems that all LOAD and STOREs are either of the instruction format 3R, 2RI12 or 2RI14. In the PrinterCapstone you can now emit these formats (and all the others) as additional information. This is also done for PPC (see Mapping.h and the generated values in the PPCGenCSMappingInsn.inc file).

Here is the code where we generate it for PPC. In your case you can do it just like for PPC there. All LoongArch instructions seem to be derived from the class LAInst. The first inheritance child of this class, is the format class. In the PPC case we search for the class I and then emit the first child of it. Which is the format class. You can do it the same way for the LoongArch instructions. Get the LAInst class and emit the name of the first child. Which is the format. Additionally to this, you should emit if the instruction loads or stores memory. You can check the CGI->mayLoad and CGI->mayStore flags in PrinterCapstone to figure this out.

In the end it should look something like this:

In loongarch.h

typedef struct {
    loongarch_insn_form form;
    loongarch_mem_access maccess; // LOONGARCH_MEM_LOAD, LOONGARCH_MEM_STORE, LOONGARCH_MEM_NONE
} loongarch_suppl_info; // add this to the union in Mapping.h

A generated entry in LoongArchGenCSMappingInsn.inc should look something like this:

{
  /* <mnemonic> */
  LOONGARCH_LOAD.... /* 337 */, LOONGARCH_INS_LOAD...,
  #ifndef CAPSTONE_DIET
    { 0 }, { 0 }, { 0 }, 0, 0, {{ LOONGARCH_INSN_FORM_2RI12, LOONGARCH_MEM_LOAD }}
  #endif
},

Now, when you add the details about the operands in LoongArchMapping.c you can check the instruction format and the memory access flags you just generated. If the instruction loads memory and you know the format, you also know which operand is the base register, disponent, offset register etc.

Would this work?

jiegec commented 2 months ago

Would this work?

Thanks, I have implemented the logic, hopefully I didn't get corner cases wrong.

jiegec commented 1 month ago

Looks very nice so far! I think the only thing left are the details of the memory operands and the instruction groups?

Afterwards we only need to run clang-format and fuzz it.

Thanks! I have added code for memory operands fixup and instruction group detection.

Rot127 commented 1 month ago

@jiegec @FuzzySecurity Before I forget. Do you have any feedback working with Auto-Sync? I would appreciate any comments and feedback on it. Especially, what to make better, what was difficult to work with and where to improve.

Rot127 commented 1 month ago

Due to https://github.com/capstone-engine/llvm-capstone/pull/45/commits/ee2e109d402d383a428677c920759a92e1437dd2 please generate the Disassembler tables again or just add the one line by hand.

Rot127 commented 1 month ago

@jiegec Did you already find time to run the fuzzing? The PR is almost done.

jiegec commented 1 month ago

@jiegec Did you already find time to run the fuzzing? The PR is almost done.

Thanks, I have run the following fuzzing tests without crashes:

./cstool loongarch32 0 0xffffffff
./cstool -d loongarch32 0 0xffffffff
./cstool loongarch64 0 0xffffffff
./cstool -d loongarch64 0 0xffffffff
XVilka commented 3 weeks ago

@kabeor @aquynh this one is good to be merged - one more architecture support in Capstone, and the one that gets more popular with time.

aquynh commented 3 weeks ago

excellent work, thank you for your effort!

just few comments on coding convention, please see my comments.

Rot127 commented 2 weeks ago

@jiegec See https://github.com/jiegec/capstone/pull/2 for the name change

aquynh commented 2 weeks ago

looking good now, except to few minor issues with the CI

XVilka commented 2 weeks ago
Run python -m unittest src/autosync/Tests/test_header_patcher.py
.F
======================================================================
FAIL: test_header_patching (src.autosync.Tests.test_header_patcher.TestHeaderPatcher.test_header_patching)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/capstone/capstone/suite/auto-sync/src/autosync/Tests/test_header_patcher.py", line 25, in test_header_patching
    self.assertEqual(
AssertionError: '// S[195 chars]f\n\n\tThis part should be included if the who[321 chars]\n\n' != '// S[195 chars]f\n\nThis part should be included if the whole[319 chars]\n\n'
Diff is 647 characters long. Set self.maxDiff to None to see it.

----------------------------------------------------------------------
Ran 2 tests in 0.006s

FAILED (failures=1)
Error: Process completed with e

cc @Rot127

Rot127 commented 1 week ago

@jiegec https://github.com/jiegec/capstone/pull/3 should fix the test. Please check the CI in the PR to be sure. Additionally, rebase this one after https://github.com/capstone-engine/capstone/pull/2391 is merged.