capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
http://www.capstone-engine.org
7.51k stars 1.54k forks source link

Add access support for RISC-V #2393

Closed wxrdnx closed 2 months ago

wxrdnx commented 3 months ago

Your checklist for this pull request

Detailed description

As indicated in PR #2003, RISC-V registers and memory operands had no read/write information. In other words, there were no access fields in cs_riscv_op. This PR added read/write information support for risc-v operands, aka (cs_riscv_op.access). This PR also fixed some RISC-V C (compressed) instruction bugs on the way.

Operand read/write details are located in arch/RISCV/RISCVMappingInsnOp.inc. Since the current LLVM support for RISC-V is not ideal, and at the mean time the RISC-V instruction set is relatively small, instruction operand details were manually added to arch/RISCV/RISCVMappingInsnOp.inc.

[!NOTE] I was recently acknowledged that the RISC-V generator will be switched to a SAIL-based one. Therefore, instruction details in arch/RISCV/RISCVMappingInsnOp.inc is merely a temporary solution. This file may be replaced once the SAIL generator is implemented.

Instruction details are added in RISCV_add_cs_detail and invoked in printOperand. Memory address and offset adjustments were rewritten in fixDetailOfEffectiveAddr to support RISC-V C (compressed) instructions. RISCV_get_detail_op, RISCV_inc_detail_op_count, RISCV_dec_detail_op_count, and RISCV_get_arch_detail are added to simplify the code.

A new function markCLSInsn marks RISC-V C (compressed) load/store instructions. Similar to markLSInsn, markCLSInsn is invoked in RISCVDisassembler_getInstruction.

Other bug fixes / new features include

  1. Fix a RISC-V C (compressed) instruction bug (#2351). That is, cstool -d riscv64 88c5 now returns the correct operand type:
 0  88 c5        c.sw   a0, 8(a1)
    ID: 131 (c.sw)
    op_count: 2
        operands[0].type: REG = a0
        operands[0].access: READ
        operands[1].type: MEM
            operands[1].mem.base: REG = a1
            operands[1].mem.disp: 0x8
        operands[1].access: WRITE

    Groups: hasStdExtC
  1. Memory base & displacement are supported now for load/store RISC-V C (compressed) instructions. For example, cstool -d riscv64 8865 now should return
 0  88 65        c.ld   a0, 8(a1)
    ID: 115 (c.ld)
    op_count: 2
        operands[0].type: REG = a0
        operands[0].access: WRITE
        operands[1].type: MEM
            operands[1].mem.base: REG = a1
            operands[1].mem.disp: 0x8
        operands[1].access: READ

    Groups: hasStdExtC isrv64

Instead of something like

 0  88 65        c.ld   a0, 8(a1)
    ID: 115 (c.ld)
    op_count: 2
        operands[0].type: REG = a0
        operands[1].type: IMM = 8
        operands[2].type: REG = a1

    Groups: hasStdExtC isrv64

...

Test plan

Test for cs_riscv_op->access is added in tests/test_riscv.c. After running tests/test_riscv, you should see operands[i].access: READ / WRITE.

More instructions can be teseted through cstool. ...

Closing issues

"closes #2003" "closes #2351"

...

Rot127 commented 3 months ago

Also, I would advice you to build with:

cmake .. -DCMAKE_BUILD_TYPE=Debug -DCAPSTONE_BUILD_CSTEST=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=1 -DENABLE_ASAN=1

With ASAN and building cstest you get the most coverage

wxrdnx commented 3 months ago

Very nice! Thanks a lot!

How did you generate the InsnOp.inc table? Did you use the legacy scripts or Auto-Sync? In the last case: are there any changes you made which we could merge as well?

Also, please see my reply in the Telegram community channel.

Hi:

  1. So the instruction details in RISCVMappingInsnOp.inc is written by hand. I did not use any legacy scripts or auto-sync (and that's part of the reason I use the name RISCVMappingInsnOp.inc not RISCVGenCSMappingInsnOp.inc). In my opinion, the generator may not deal with some subtleties in RISC-V. For example, RISC-V compression instructions are handled inconsistently (e.g., c.addi takes 2 operands, but RISCVGenDisassemblerTables.inc wants 3), which makes manual editing unavoidable.

  2. I've just saw your reply on Telegram. Thanks for the detailed reply! Although it may take years to add SAILS support, the current Capstone RISC-V disassembler works like charm. There may be some edge cases, but overall it works pretty well. We'll be working on fixing these bugs and contributing to Capstone while developing my project.

Also, I would advice you to build with:

cmake .. -DCMAKE_BUILD_TYPE=Debug -DCAPSTONE_BUILD_CSTEST=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=1 -DENABLE_ASAN=1

With ASAN and building cstest you get the most coverage

Thanks for the command! I will try to build it with the above command, fix some bugs, and make the checks work.

Rot127 commented 3 months ago

We'll be working on fixing these bugs and contributing to Capstone while developing my project.

Very much appreciate this! Especially since the v6 release is coming up, this is a really nice addition.

Rot127 commented 3 months ago

We have to trigger the CI every time when you push (because we want to stay within the CI ranges, which are free). So I'd suggest you try to build everything locally first. I'll be online for the next few hours and trigger it when I see it, but probably forget it sometimes.

wxrdnx commented 3 months ago

We have to trigger the CI every time when you push (because we want to stay within the CI ranges, which are free). So I'd suggest you try to build everything locally first. I'll be online for the next few hours and trigger it when I see it, but probably forget it sometimes.

All right. I can build it locally now.

wxrdnx commented 3 months ago

New change:

  1. Fix incorrect operands for the csr instruction and the cs.w (and its variants) instruction.
  2. Add Python binding
  3. Add new test cases for M,A,F,D,C instructions
Rot127 commented 3 months ago
 =================================================================
==62==ERROR: AddressSanitizer: ABRT on unknown address 0x00000000003e (pc 0x7f882003500b bp 0x7f88201aa588 sp 0x7ffebc63ac00 T0)
SCARINESS: 10 (signal)
    #0 0x7f882003500b in raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300b) (BuildId: 0702430aef5fa3dda43986563e9ffcc47efbd75e)
    #1 0x7f8820014858 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x22858) (BuildId: 0702430aef5fa3dda43986563e9ffcc47efbd75e)
    #2 0x7f8820014728  (/lib/x86_64-linux-gnu/libc.so.6+0x22728) (BuildId: 0702430aef5fa3dda43986563e9ffcc47efbd75e)
    #3 0x7f8820025fd5 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x33fd5) (BuildId: 0702430aef5fa3dda43986563e9ffcc47efbd75e)
    #4 0x562e678fd14b in RISCV_add_cs_detail /src/capstonenext/arch/RISCV/RISCVMapping.c:171:3
    #5 0x562e678fba44 in printOperand /src/capstonenext/arch/RISCV/RISCVInstPrinter.c:274:4
    #6 0x562e678f2a82 in printInstruction /src/capstonenext/arch/RISCV/RISCVGenAsmWriter.inc:1208:5
    #7 0x562e678f2a82 in RISCV_printInst /src/capstonenext/arch/RISCV/RISCVInstPrinter.c:250:7
    #8 0x562e677b51fd in cs_disasm /src/capstonenext/cs.c:1241:4
    #9 0x562e677b2d41 in LLVMFuzzerTestOneInput /src/capstonenext/suite/fuzz/fuzz_disasm.c:56:20
    #10 0x562e67665470 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:614:13
    #11 0x562e67664c95 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:516:7
    #12 0x562e67666c22 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile>>&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:829:7
    #13 0x562e67666f57 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile>>&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:867:3
    #14 0x562e67655566 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:914:6
    #15 0x562e67681a92 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #16 0x7f8820016082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 0702430aef5fa3dda43986563e9ffcc47efbd75e)
    #17 0x562e676466dd in _start (build-out/fuzz_disasmnext+0x5276dd)

DEDUP_TOKEN: raise--abort--
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: ABRT (/lib/x86_64-linux-gnu/libc.so.6+0x4300b) (BuildId: 0702430aef5fa3dda43986563e9ffcc47efbd75e) in raise
==62==ABORTING
MS: 0 ; base unit: 0000000000000000000000000000000000000000
0x2d,0x73,0x0,0x15,0x13,0x0,0xd3,0x0,0x0,0xa6,0xef,
-s\000\025\023\000\323\000\000\246\357
wxrdnx commented 3 months ago

This error is caused by an insufficient number of operands in sfence.vma. Similar bug occurs in fence. Both should be fixed now.

wxrdnx commented 3 months ago

Hi,

May I ask what is the purpose of test/cs_detail/issue.cs? What test cases should I add to it?

Thank you!

Rot127 commented 3 months ago

It tests the details. In your case the READ/WRITE primitives. The reason I ask you is because I currently modernize testing and will translate those tests to yaml. The tests in test_ARCH.... as well at some point. But they are more annoying to copy.

Also, the tests you added only ensure decoding. Not that the decoded values are correct. If it prints something wrong, the CI doesn't care.

wxrdnx commented 3 months ago

Added tests intest/cs_detail/issue.cs Also rebased to next

XVilka commented 2 months ago

@kabeor @aquynh please take a look.

@moste00 FYI

kabeor commented 2 months ago

It looks nice, but plz solve the conflict file:)

Rot127 commented 2 months ago

@kabor Resolved the conflict