RISCV basic support - Githubissues

Antwy commented 2 months ago

Hi! Added basic support for RISCV instruction set. This covers most of IMC standard ISA extensions for RV32 & RV64. It would be great if you could give some review and merge it.

Some details:

The provided architecture(ARCH_RVxx) contains compressed instructions by default
As RV32 is a subset of RV64 (except 1 instruction), both of them are situated in a single namespace riscv alike x86, the difference is the register sizes
RISCV in Capstone needs refactoring so it can fail to process some instructions (issue #2278)
As Capstone doesn't have enumeration of pseudo instructions yet (like alias id for arm), they are detected while building semantics of corresponding instruction

JonathanSalwan commented 2 months ago

Awesome. Thanks for a such MR. Let me few weeks to review this. Can you try to fix CIs?

JonathanSalwan commented 2 months ago

@cnheitman can you take a look at this too so that we have at least two reviews for a such MR?

cnheitman commented 2 months ago

Awesome! Great PR!

@JonathanSalwan Yes, I'll review it, most probably sometime next week.

Antwy commented 2 months ago

Well, I guess one way to fix CI is to update Capstone version from 4.0.2 to 5.0+. But if you need the older version, I can try to put riscv code under defines

cnheitman commented 2 months ago

The PR looks good, great work @Antwy o/

This PR was based on master (I think) and does not include the commit upgrading Bitwuzla from v0.2.0 to v0.4.0 of dev-v1.0. However, there should be no conflicts rebasing on top of that change.

I did not dive much into the semantics file. On a quick overview they look good. If we want to increase our confidence on the code, we can add some more tests (for instance, we have a binary with different optimization levels which we use to test the ARM semantics). However, basic unittests were included so it seems fine for now.

Ideally, I would add a RV32 and RV64 version of this custom crackme so we can have an example of a full working binary, as we do with AARCH64 and ARM.

Regarding the CI and Capstone. I think we can move on to version 5.0.1. Version 4.0.2 is from 2020 (iirc) and as far as I can tell we don't have any specific reason to keep supporting it.

Antwy commented 1 month ago

Now rebased on dev-v1.0 and some semantics issues fixed (but still not sure in taint spreading variants).
Adding binary test seems reasonable as basic unittest suite doesn't cover controlflow transfers.

Antwy commented 1 month ago

Added the crackme binary test

mrexodia commented 4 weeks ago

Might be worthwhile to take a look at the official RISC-V ISA tests: https://github.com/riscv-software-src/riscv-tests

I wrote a small python script to process the compiled files and generate a data.h that contains enough information to start emulation: https://github.com/thesecretclub/riscy-business/blob/master/riscvm/generate-isa-tests.py. You just have to modify this chunk of code to set up the triton context and start emulation:

https://github.com/thesecretclub/riscy-business/blob/master/riscvm/tests.cpp#L44-L55

The reason for the generate-isa-tests.py is that the test executables have some common setup, which requires support for supervisor instructions/paging so I extract just the raw RV64I code and run that directly (since the setup can be done on the emulator host side).

The process should be similar for the 32-bit ISA tests, but I didn't compile or try them. It helped me a lot to find bugs in my emulator, so it was definitely worth the effort in my opinion! You also do not need unicorn anymore (because given their track record their semantics are probably incorrect), or at least it can be an additional test.

m4drat commented 3 weeks ago

Hi! I've played around with this PR and seems to have found a bug. According to RISCV ISA manual the SLL instruction in RV64I should perform a logical shift left by the shift amount from the lower 6 bits held in the third operand (this also applies to other similar instructions).

In RV64I, only the low 6 bits of rs2 are considered for the shift amount.

However, in the current implementation only the lower 5 bits are used.

This example demonstrates the problem:

from unicorn.riscv_const import *
from unicorn import *
from triton import *

CODE_START = 0x0

def emu_unicorn():
    # sll s8, s7, a7
    opcode = b"\x33\x9c\x1b\x01"

    mu = Uc(UC_ARCH_RISCV, UC_MODE_RISCV64)
    mu.mem_map(CODE_START, 0x1000)
    mu.mem_write(CODE_START, opcode)

    mu.reg_write(UC_RISCV_REG_A7, 0x69C99AB9B9401024)
    mu.reg_write(UC_RISCV_REG_S7, 0xDB4D6868655C3585)

    mu.reg_write(UC_RISCV_REG_PC, CODE_START)
    mu.emu_start(CODE_START, CODE_START + len(opcode))

    print("(UC) s8 = 0x%x" % mu.reg_read(UC_RISCV_REG_S8))

def emu_triton():
    # sll s8, s7, a7
    opcode = b"\x33\x9c\x1b\x01"

    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    inst = Instruction()
    inst.setOpcode(opcode)
    inst.setAddress(CODE_START)

    ctx.setConcreteRegisterValue(ctx.registers.x17, 0x69C99AB9B9401024)
    ctx.setConcreteRegisterValue(ctx.registers.x23, 0xDB4D6868655C3585)

    ctx.processing(inst)

    print("(TT) s8 = 0x%x" % ctx.getConcreteRegisterValue(ctx.registers.x24))

def main():
    emu_unicorn()
    emu_triton()

if __name__ == "__main__":
    main()

Antwy commented 3 weeks ago

Good catch, @m4drat! Thanks!

mrexodia commented 3 weeks ago

Modified my generation script to generate a python file that can be used for the tests:

from elftools.elf.elffile import ELFFile
from elftools.elf.sections import SymbolTableSection
import os
import zlib

def parse_test_elf(file):
    with open(file, "rb") as f:
        elf = ELFFile(f)
        # Enumerate the SymbolTableSection
        for section in elf.iter_sections():
            if isinstance(section, SymbolTableSection):
                for i in range(section.num_symbols()):
                    symbol = section.get_symbol(i)
                    if symbol.name:
                        if symbol.name.startswith("test_"):
                            address = symbol.entry.st_value
                            # Convert address to file offset
                            offset = list(elf.address_offsets(address))[0]
                            return address, offset
    return None, None

def main():
    tests = []
    directory = "isa-tests"
    code = "import zlib\n\n"
    for file in sorted(os.listdir(directory)):
        if file.startswith("rv64") and not file.endswith(".dump"):
            path = os.path.join(directory, file)
            address, offset = parse_test_elf(path)
            if offset is None:
                print(f"Failed to parse {file}")
                continue
            data = f"__{file.replace('-', '_').upper()}_DATA = zlib.decompress(b\""
            with open(path, "rb") as f:
                for byte in zlib.compress(f.read(), 9):
                    data += f"\\x{byte:02x}"
            data += "\")\n"
            code += data
            tests.append((file, address, offset))

    code += "\n"

    code += "TESTS = [\n"
    for name, address, offset in tests:
        variable = f"__{name.replace('-', '_').upper()}_DATA"
        code += f"    (\"{name}\", {variable}, {hex(address)}, {hex(offset)}),\n"
    code += "]\n"

    with open("isa-tests/data.py", "wb") as f:
        f.write(code.encode("utf-8"))

if __name__ == "__main__":
    main()

The generated data.py: data.py.zip (384 kb)

m4drat commented 3 weeks ago

@Antwy

Hm, I might be wrong on this one, but I couldn't see in the code whether you handle RV32/RV64 cases differently. Because the shift amount should depend on the target. 6 bits - RV64, 5 bits - RV32.

To be precise, here are the quotes:

SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in register rs2. In RV64I, only the low 6 bits of rs2 are considered for the shift amount.

SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in the lower 5 bits of register rs2.

mrexodia commented 3 weeks ago

Spent some time rigging up the official ISA tests on top of this PR:

import sys
from struct import pack

from triton import *
from riscv64_data import TESTS as RV64TESTS

def emulate_test(name: str, binary: bytes, address: int, offset: int, trace: bool):
    # initial state
    STACK = 0x200000
    istate = {
        "stack": bytearray(b"".join([pack('B', 255 - i) for i in range(256)])),
        "heap":  bytearray(b"".join([pack('B', i) for i in range(256)])),
        "x0":    0x0,
        "x1":    0x0,
        "x2":    STACK,
        "x3":    0x0,
        "x4":    0x0,
        "x5":    0x0,
        "x6":    0x0,
        "x7":    0x0,
        "x8":    0x0,
        "x9":    0x0,
        "x10":   0x0,
        "x11":   0x0,
        "x12":   0x0,
        "x13":   0x0,
        "x14":   0x0,
        "x15":   0x0,
        "x16":   0x0,
        "x17":   0x0,
        "x18":   0x0,
        "x19":   0x0,
        "x20":   0x0,
        "x21":   0x0,
        "x22":   0x0,
        "x23":   0x0,
        "x24":   0x0,
        "x25":   0x0,
        "x26":   0x0,
        "x27":   0x0,
        "x28":   0x0,
        "x29":   0x0,
        "x30":   0x0,
        "x31":   0x0,
        "f0":    0x00112233445566778899aabbccddeeff,
        "f1":    0xffeeddccbbaa99887766554433221100,
        "f2":    0xfefedcdc5656787889892692dfeccaa0,
        "f3":    0x1234567890987654321bcdffccddee01,
        "f4":    0x0,
        "f5":    0x0,
        "f6":    0x0,
        "f7":    0x0,
        "f8":    0x0,
        "f9":    0x0,
        "f10":   0x0,
        "f11":   0x0,
        "f12":   0x0,
        "f13":   0x0,
        "f14":   0x0,
        "f15":   0x0,
        "f16":   0x0,
        "f17":   0x0,
        "f18":   0x0,
        "f19":   0x0,
        "f20":   0x0,
        "f21":   0x0,
        "f22":   0x0,
        "f23":   0x0,
        "f24":   0x0,
        "f25":   0x0,
        "f26":   0x0,
        "f27":   0x0,
        "f28":   0x0,
        "f29":   0x0,
        "f30":   0x0,
        "f31":   0x0,
        "pc":    address,
    }

    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    ctx.setConcreteMemoryAreaValue(STACK,           bytes(istate['stack']))
    ctx.setConcreteMemoryAreaValue(address, binary[offset:])
    ctx.setConcreteRegisterValue(ctx.registers.x0,  0)
    ctx.setConcreteRegisterValue(ctx.registers.x1,  istate['x1'])
    ctx.setConcreteRegisterValue(ctx.registers.x2,  istate['x2'])
    ctx.setConcreteRegisterValue(ctx.registers.x3,  istate['x3'])
    ctx.setConcreteRegisterValue(ctx.registers.x4,  istate['x4'])
    ctx.setConcreteRegisterValue(ctx.registers.x5,  istate['x5'])
    ctx.setConcreteRegisterValue(ctx.registers.x6,  istate['x6'])
    ctx.setConcreteRegisterValue(ctx.registers.x7,  istate['x7'])
    ctx.setConcreteRegisterValue(ctx.registers.x8,  istate['x8'])
    ctx.setConcreteRegisterValue(ctx.registers.x9,  istate['x9'])
    ctx.setConcreteRegisterValue(ctx.registers.x10, istate['x10'])
    ctx.setConcreteRegisterValue(ctx.registers.x11, istate['x11'])
    ctx.setConcreteRegisterValue(ctx.registers.x12, istate['x12'])
    ctx.setConcreteRegisterValue(ctx.registers.x13, istate['x13'])
    ctx.setConcreteRegisterValue(ctx.registers.x14, istate['x14'])
    ctx.setConcreteRegisterValue(ctx.registers.x15, istate['x15'])
    ctx.setConcreteRegisterValue(ctx.registers.x16, istate['x16'])
    ctx.setConcreteRegisterValue(ctx.registers.x17, istate['x17'])
    ctx.setConcreteRegisterValue(ctx.registers.x18, istate['x18'])
    ctx.setConcreteRegisterValue(ctx.registers.x19, istate['x19'])
    ctx.setConcreteRegisterValue(ctx.registers.x20, istate['x20'])
    ctx.setConcreteRegisterValue(ctx.registers.x21, istate['x21'])
    ctx.setConcreteRegisterValue(ctx.registers.x22, istate['x22'])
    ctx.setConcreteRegisterValue(ctx.registers.x23, istate['x23'])
    ctx.setConcreteRegisterValue(ctx.registers.x24, istate['x24'])
    ctx.setConcreteRegisterValue(ctx.registers.x25, istate['x25'])
    ctx.setConcreteRegisterValue(ctx.registers.x26, istate['x26'])
    ctx.setConcreteRegisterValue(ctx.registers.x27, istate['x27'])
    ctx.setConcreteRegisterValue(ctx.registers.x28, istate['x28'])
    ctx.setConcreteRegisterValue(ctx.registers.x29, istate['x29'])
    ctx.setConcreteRegisterValue(ctx.registers.x30, istate['x30'])
    ctx.setConcreteRegisterValue(ctx.registers.x31, istate['x31'])
    ctx.setConcreteRegisterValue(ctx.registers.f0,  istate['f0'])
    ctx.setConcreteRegisterValue(ctx.registers.f1,  istate['f1'])
    ctx.setConcreteRegisterValue(ctx.registers.f2,  istate['f2'])
    ctx.setConcreteRegisterValue(ctx.registers.f3,  istate['f3'])
    ctx.setConcreteRegisterValue(ctx.registers.f4,  istate['f4'])
    ctx.setConcreteRegisterValue(ctx.registers.f5,  istate['f5'])
    ctx.setConcreteRegisterValue(ctx.registers.f6,  istate['f6'])
    ctx.setConcreteRegisterValue(ctx.registers.f7,  istate['f7'])
    ctx.setConcreteRegisterValue(ctx.registers.f8,  istate['f8'])
    ctx.setConcreteRegisterValue(ctx.registers.f9,  istate['f9'])
    ctx.setConcreteRegisterValue(ctx.registers.f10, istate['f10'])
    ctx.setConcreteRegisterValue(ctx.registers.f11, istate['f11'])
    ctx.setConcreteRegisterValue(ctx.registers.f12, istate['f12'])
    ctx.setConcreteRegisterValue(ctx.registers.f13, istate['f13'])
    ctx.setConcreteRegisterValue(ctx.registers.f14, istate['f14'])
    ctx.setConcreteRegisterValue(ctx.registers.f15, istate['f15'])
    ctx.setConcreteRegisterValue(ctx.registers.f16, istate['f16'])
    ctx.setConcreteRegisterValue(ctx.registers.f17, istate['f17'])
    ctx.setConcreteRegisterValue(ctx.registers.f18, istate['f18'])
    ctx.setConcreteRegisterValue(ctx.registers.f19, istate['f19'])
    ctx.setConcreteRegisterValue(ctx.registers.f20, istate['f20'])
    ctx.setConcreteRegisterValue(ctx.registers.f21, istate['f21'])
    ctx.setConcreteRegisterValue(ctx.registers.f22, istate['f22'])
    ctx.setConcreteRegisterValue(ctx.registers.f23, istate['f23'])
    ctx.setConcreteRegisterValue(ctx.registers.f24, istate['f24'])
    ctx.setConcreteRegisterValue(ctx.registers.f25, istate['f25'])
    ctx.setConcreteRegisterValue(ctx.registers.f26, istate['f26'])
    ctx.setConcreteRegisterValue(ctx.registers.f27, istate['f27'])
    ctx.setConcreteRegisterValue(ctx.registers.f28, istate['f28'])
    ctx.setConcreteRegisterValue(ctx.registers.f29, istate['f29'])
    ctx.setConcreteRegisterValue(ctx.registers.f30, istate['f30'])
    ctx.setConcreteRegisterValue(ctx.registers.f31, istate['f31'])

    pc = istate['pc']
    for i in range(1000):
        ctx.setConcreteRegisterValue(ctx.registers.pc, pc)
        opcode = ctx.getConcreteMemoryValue(MemoryAccess(pc, CPUSIZE.DWORD))
        opcode_bytes = pack('<I', opcode)
        inst = Instruction(opcode_bytes)
        inst.setAddress(pc)
        state = ctx.processing(inst)
        if trace:
            print(inst)
        if state == EXCEPTION.NO_FAULT:
            pc = ctx.getConcreteRegisterValue(ctx.registers.pc)
        else:
            disasm = inst.getDisassembly()
            if "fence" in disasm:
                # HACK: ignore the unsupported fence instruction
                pc += 4
            elif "ecall" in disasm:
                syscall_index = ctx.getConcreteRegisterValue(ctx.registers.x17)
                #assert syscall_index == 139, f"invalid syscall: {syscall_index}"
                return ctx.getConcreteRegisterValue(ctx.registers.x10)
            else:
                raise Exception(f"{inst} -> exception {state}")
    return -1

if __name__ == "__main__":
    success = 0
    for name, binary, address, offset in RV64TESTS:
        exit_code = emulate_test(name, binary, address, offset, trace=False)
        if exit_code == 0:
            print(f"SUCCESS: {name}")
            success += 1
        else:
            print(f"FAILURE: {name}, {exit_code}")
    print(f"\n{success}/{len(RV64TESTS)} passed")

Unfortunately the success rate isn't the best, here is the output:

FAILURE: rv64ui-p-add, 46
FAILURE: rv64ui-p-addi, 83
FAILURE: rv64ui-p-addiw, 83
FAILURE: rv64ui-p-addw, 46
SUCCESS: rv64ui-p-and
FAILURE: rv64ui-p-andi, 15
SUCCESS: rv64ui-p-auipc
SUCCESS: rv64ui-p-beq
SUCCESS: rv64ui-p-bge
SUCCESS: rv64ui-p-bgeu
SUCCESS: rv64ui-p-blt
SUCCESS: rv64ui-p-bltu
SUCCESS: rv64ui-p-bne
SUCCESS: rv64ui-p-fence_i
SUCCESS: rv64ui-p-jal
FAILURE: rv64ui-p-jalr, 15
SUCCESS: rv64ui-p-lb
SUCCESS: rv64ui-p-lbu
SUCCESS: rv64ui-p-ld
SUCCESS: rv64ui-p-lh
SUCCESS: rv64ui-p-lhu
FAILURE: rv64ui-p-lui, 18446744071562067968
SUCCESS: rv64ui-p-lw
SUCCESS: rv64ui-p-lwu
FAILURE: rv64ui-p-ma_data, -1
FAILURE: rv64ui-p-or, 858993459
FAILURE: rv64ui-p-ori, 16713727
SUCCESS: rv64ui-p-sb
SUCCESS: rv64ui-p-sd
SUCCESS: rv64ui-p-sh
SUCCESS: rv64ui-p-simple
FAILURE: rv64ui-p-sll, 1024
FAILURE: rv64ui-p-slli, 34603008
FAILURE: rv64ui-p-slliw, 13
FAILURE: rv64ui-p-sllw, 13
FAILURE: rv64ui-p-slt, 1
SUCCESS: rv64ui-p-slti
FAILURE: rv64ui-p-sltiu, 1
FAILURE: rv64ui-p-sltu, 1
FAILURE: rv64ui-p-sra, 1024
SUCCESS: rv64ui-p-srai
SUCCESS: rv64ui-p-sraiw
FAILURE: rv64ui-p-sraw, 1024
FAILURE: rv64ui-p-srl, 1024
SUCCESS: rv64ui-p-srli
FAILURE: rv64ui-p-srliw, 7
FAILURE: rv64ui-p-srlw, 7
FAILURE: rv64ui-p-sub, 18446744073709551602
FAILURE: rv64ui-p-subw, 18446744073709551602
SUCCESS: rv64ui-p-sw
FAILURE: rv64ui-p-xor, 858993459
FAILURE: rv64ui-p-xori, 16713712
FAILURE: rv64um-p-div, 17
FAILURE: rv64um-p-divu, 17
FAILURE: rv64um-p-divuw, 17
FAILURE: rv64um-p-divw, 17
FAILURE: rv64um-p-mul, 1122
FAILURE: rv64um-p-mulh, 1122
FAILURE: rv64um-p-mulhsu, 1122
FAILURE: rv64um-p-mulhu, 1122
FAILURE: rv64um-p-mulw, 1122
FAILURE: rv64um-p-rem, 17
FAILURE: rv64um-p-remu, 17
FAILURE: rv64um-p-remuw, 17
FAILURE: rv64um-p-remw, 17

26/65 passed

Might be I set something up incorrectly though, but the register names are not matching the disassembly so it's difficult to debug/trace without creating additional mapping etc.

Antwy commented 3 weeks ago

@mrexodia Well, adding to current testsuite debug printing and lines from rv64ui-lui which has status 'FAILURE' I've got:

Instruction:  0x10011a: lui ra, 0
x0:   0x0
x1:   0x0
----------------
[OK] lui   x1, #0x00000
-------------------------------
Instruction:  0x10011e: lui ra, 0xfffff
x0:   0x0
x1:   0xfffffffffffff000
----------------
[OK] lui   x1, #0xfffff
-------------------------------
Instruction:  0x100122: srai ra, ra, 1
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] srai  x1, x1, #1
-------------------------------
Instruction:  0x100126: lui ra, 0x7ffff
x0:   0x0
x1:   0x7ffff000
----------------
[OK] lui   x1, #0x7ffff
-------------------------------
Instruction:  0x10012a: srai ra, ra, 0x14
x0:   0x0
x1:   0x7ff
----------------
[OK] srai  x1, x1, #20
-------------------------------
Instruction:  0x10012e: lui ra, 0x80000
x0:   0x0
x1:   0xffffffff80000000
----------------
[OK] lui   x1, #0x80000
-------------------------------
Instruction:  0x100132: srai ra, ra, 0x14
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] srai  x1, x1, #20
-------------------------------
Instruction:  0x100136: lui zero, 0
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] lui   x0, #0x80000

These seem like equal to test expected values. Maybe the result above is caused by parsing issues in case of more than one instruction in testcase or instruction with immediate operand written without "i".

The test lines in src/testers/riscv/unicorn_test_riscv64.py:

    (b"\xb7\x00\x00\x00", "lui   x1, #0x00000"),
    (b"\xb7\xf0\xff\xff", "lui   x1, #0xfffff"),
    (b"\x93\xd0\x10\x40", "srai  x1, x1, #1"),
    (b"\xb7\xf0\xff\x7f", "lui   x1, #0x7ffff"),
    (b"\x93\xd0\x40\x41", "srai  x1, x1, #20"),
    (b"\xb7\x00\x00\x80", "lui   x1, #0x80000"),
    (b"\x93\xd0\x40\x41", "srai  x1, x1, #20"),
    (b"\x37\x00\x00\x00", "lui   x0, #0x80000"),

and the debug printing right after ctx.processing(inst):

    print("Instruction: ", inst)
    print("x0:  ", hex(ctx.getSymbolicRegisterValue(ctx.registers.x0)))
    print("x1:  ", hex(ctx.getSymbolicRegisterValue(ctx.registers.x1)))

m4drat commented 3 weeks ago

@Antwy

Stumbled upon another corner-case for the REMW instruction:

from unicorn.riscv_const import *
from unicorn import *
from triton import *

CODE_START = 0x0

def emu_unicorn():
    # remw s0, s5, t0
    opcode = b"\x3b\xe4\x5a\x02"

    mu = Uc(UC_ARCH_RISCV, UC_MODE_RISCV64)
    mu.mem_map(CODE_START, 0x1000)
    mu.mem_write(CODE_START, opcode)

    mu.reg_write(UC_RISCV_REG_S5, 0x917665C427EBEE5D)
    mu.reg_write(UC_RISCV_REG_T0, 0x0000000000000000)

    mu.reg_write(UC_RISCV_REG_PC, CODE_START)
    mu.emu_start(CODE_START, CODE_START + len(opcode))

    print("(UC) s0 = 0x%x" % mu.reg_read(UC_RISCV_REG_S0))

def emu_triton():
    # remw s0, s5, t0
    opcode = b"\x3b\xe4\x5a\x02"

    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    inst = Instruction()
    inst.setOpcode(opcode)
    inst.setAddress(CODE_START)

    ctx.setConcreteRegisterValue(ctx.registers.x21, 0x917665C427EBEE5D)
    ctx.setConcreteRegisterValue(ctx.registers.x5, 0x0000000000000000)

    ctx.processing(inst)

    print("(TT) s0 = 0x%x" % ctx.getConcreteRegisterValue(ctx.registers.x8))

def main():
    emu_unicorn()
    emu_triton()

if __name__ == "__main__":
    main()

According to ISA Manual, in case of division by 0, the result of the operation should be equal to the lowest 32-bits of the dividend, not 0.

The semantics for division by zero and division overflow are summarized in Table 11. The quotient of division by zero has all bits set, and the remainder of division by zero equals the dividend. Signed division overflow occurs only when the most-negative integer is divided by . The quotient of a signed division with overflow is equal to the dividend, and the remainder is zero. Unsigned division overflow cannot occur.

mrexodia commented 3 weeks ago

@mrexodia Well, adding to current testsuite debug printing and lines from rv64ui-lui which has status 'FAILURE' I've got:

Instruction:  0x10011a: lui ra, 0
x0:   0x0
x1:   0x0
----------------
[OK] lui   x1, #0x00000
-------------------------------
Instruction:  0x10011e: lui ra, 0xfffff
x0:   0x0
x1:   0xfffffffffffff000
----------------
[OK] lui   x1, #0xfffff
-------------------------------
Instruction:  0x100122: srai ra, ra, 1
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] srai  x1, x1, #1
-------------------------------
Instruction:  0x100126: lui ra, 0x7ffff
x0:   0x0
x1:   0x7ffff000
----------------
[OK] lui   x1, #0x7ffff
-------------------------------
Instruction:  0x10012a: srai ra, ra, 0x14
x0:   0x0
x1:   0x7ff
----------------
[OK] srai  x1, x1, #20
-------------------------------
Instruction:  0x10012e: lui ra, 0x80000
x0:   0x0
x1:   0xffffffff80000000
----------------
[OK] lui   x1, #0x80000
-------------------------------
Instruction:  0x100132: srai ra, ra, 0x14
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] srai  x1, x1, #20
-------------------------------
Instruction:  0x100136: lui zero, 0
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] lui   x0, #0x80000

These seem like equal to test expected values. Maybe the result above is caused by parsing issues in case of more than one instruction in testcase or instruction with immediate operand written without "i".

The test lines in src/testers/riscv/unicorn_test_riscv64.py:

    (b"\xb7\x00\x00\x00", "lui   x1, #0x00000"),
    (b"\xb7\xf0\xff\xff", "lui   x1, #0xfffff"),
    (b"\x93\xd0\x10\x40", "srai  x1, x1, #1"),
    (b"\xb7\xf0\xff\x7f", "lui   x1, #0x7ffff"),
    (b"\x93\xd0\x40\x41", "srai  x1, x1, #20"),
    (b"\xb7\x00\x00\x80", "lui   x1, #0x80000"),
    (b"\x93\xd0\x40\x41", "srai  x1, x1, #20"),
    (b"\x37\x00\x00\x00", "lui   x0, #0x80000"),

and the debug printing right after ctx.processing(inst):

    print("Instruction: ", inst)
    print("x0:  ", hex(ctx.getSymbolicRegisterValue(ctx.registers.x0)))
    print("x1:  ", hex(ctx.getSymbolicRegisterValue(ctx.registers.x1)))

For my emulator I ran into bugs with the immediate loading. So the operation was correct, but certain encodings related to (shifted) immediates were not (especially the sign extension is very complicated). That might also be the case here…

Here are the traces from my emulator, might be helpful: rv64ui-traces.zip

Antwy commented 2 weeks ago

@m4drat Thanks! Guess this one is fixed too. Please, let me know if you find anything else!

JonathanSalwan commented 2 weeks ago

Can you fix vcpkg by adding the risc feature and updating Capstone version for Appveyor? Once all CIs are green, I will do a quick review and merge it to dev-v1.0 :)

JonathanSalwan commented 2 weeks ago

https://vcpkg.link/ports/capstone/v/5.0.1/1

I think we also have to update capstone in vcpkg to switch from 5.0.0-rc2 to 5.0.1

Antwy commented 2 weeks ago

For now, I think, CI failure is connected with this capstone issue

m4drat commented 1 week ago

@Antwy

Found a problem with SLLIW instruction:

from unicorn.riscv_const import *
from unicorn import *
from triton import *

CODE_START = 0x0

def emu_unicorn():
    # slliw t0, s4, 0xc
    opcode = b"\x9b\x12\xca\x00"

    mu = Uc(UC_ARCH_RISCV, UC_MODE_RISCV64)
    mu.mem_map(CODE_START, 0x1000)
    mu.mem_write(CODE_START, opcode)

    mu.reg_write(UC_RISCV_REG_S4, 0x10ab95)
    mu.reg_write(UC_RISCV_REG_T0, 0x000000)

    mu.reg_write(UC_RISCV_REG_PC, CODE_START)
    mu.emu_start(CODE_START, CODE_START + len(opcode))

    print("(UC) s0 = 0x%x" % mu.reg_read(UC_RISCV_REG_T0))

def emu_triton():
    # slliw t0, s4, 0xc
    opcode = b"\x9b\x12\xca\x00"

    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    inst = Instruction()
    inst.setOpcode(opcode)
    inst.setAddress(CODE_START)

    ctx.setConcreteRegisterValue(ctx.registers.x20, 0x10ab95)
    ctx.setConcreteRegisterValue(ctx.registers.x5, 0x000000)

    ctx.processing(inst)

    print("(TT) s0 = 0x%x" % ctx.getConcreteRegisterValue(ctx.registers.x5))

def main():
    emu_unicorn()
    emu_triton()

if __name__ == "__main__":
    main()

Antwy commented 1 week ago

Fixed. Thanks again, @m4drat :)

mrexodia commented 1 week ago

I re-ran the ISA tests and things are looking better!

FAILURE: rv64ui-p-add, 46
FAILURE: rv64ui-p-addi, 83
FAILURE: rv64ui-p-addiw, 83
FAILURE: rv64ui-p-addw, 46
SUCCESS: rv64ui-p-and
FAILURE: rv64ui-p-andi, 15
SUCCESS: rv64ui-p-auipc
SUCCESS: rv64ui-p-beq
SUCCESS: rv64ui-p-bge
SUCCESS: rv64ui-p-bgeu
SUCCESS: rv64ui-p-blt
SUCCESS: rv64ui-p-bltu
SUCCESS: rv64ui-p-bne
SUCCESS: rv64ui-p-fence_i
SUCCESS: rv64ui-p-jal
FAILURE: rv64ui-p-jalr, 15
SUCCESS: rv64ui-p-lb
SUCCESS: rv64ui-p-lbu
SUCCESS: rv64ui-p-ld
SUCCESS: rv64ui-p-lh
SUCCESS: rv64ui-p-lhu
FAILURE: rv64ui-p-lui, 18446744071562067968
SUCCESS: rv64ui-p-lw
SUCCESS: rv64ui-p-lwu
FAILURE: rv64ui-p-ma_data, -1
FAILURE: rv64ui-p-or, 858993459
FAILURE: rv64ui-p-ori, 16713727
SUCCESS: rv64ui-p-sb
SUCCESS: rv64ui-p-sd
SUCCESS: rv64ui-p-sh
SUCCESS: rv64ui-p-simple
FAILURE: rv64ui-p-sll, 1024
FAILURE: rv64ui-p-slli, 34603008
FAILURE: rv64ui-p-slliw, 18446744073441116160
FAILURE: rv64ui-p-sllw, 1024
FAILURE: rv64ui-p-slt, 1
SUCCESS: rv64ui-p-slti
FAILURE: rv64ui-p-sltiu, 1
FAILURE: rv64ui-p-sltu, 1
FAILURE: rv64ui-p-sra, 1024
SUCCESS: rv64ui-p-srai
SUCCESS: rv64ui-p-sraiw
FAILURE: rv64ui-p-sraw, 1024
FAILURE: rv64ui-p-srl, 1024
SUCCESS: rv64ui-p-srli
SUCCESS: rv64ui-p-srliw
FAILURE: rv64ui-p-srlw, 1024
FAILURE: rv64ui-p-sub, 18446744073709551602
FAILURE: rv64ui-p-subw, 18446744073709551602
SUCCESS: rv64ui-p-sw
FAILURE: rv64ui-p-xor, 858993459
FAILURE: rv64ui-p-xori, 16713712
SUCCESS: rv64um-p-div
SUCCESS: rv64um-p-divu
SUCCESS: rv64um-p-divuw
SUCCESS: rv64um-p-divw
FAILURE: rv64um-p-mul, 1122
FAILURE: rv64um-p-mulh, 1122
FAILURE: rv64um-p-mulhsu, 1122
FAILURE: rv64um-p-mulhu, 1122
FAILURE: rv64um-p-mulw, 1122
SUCCESS: rv64um-p-rem
SUCCESS: rv64um-p-remu
SUCCESS: rv64um-p-remuw
SUCCESS: rv64um-p-remw

35/65 passed

mrexodia commented 1 week ago

The problem is that you allow assignments to x0/zero:

from unicorn.riscv_const import *
from unicorn import *
from triton import *

CODE_START = 0x0

def emu_unicorn():
    # lui zero, 0x80000
    # mv a0, zero
    opcode = bytes.fromhex("37 00 00 80 13 05 00 00")

    mu = Uc(UC_ARCH_RISCV, UC_MODE_RISCV64)
    mu.mem_map(CODE_START, 0x1000)
    mu.mem_write(CODE_START, opcode)

    mu.reg_write(UC_RISCV_REG_A0, 0xffffffff80000000)

    mu.reg_write(UC_RISCV_REG_PC, CODE_START)
    mu.emu_start(CODE_START, CODE_START + len(opcode))

    print(f"(UC) a0 = {hex(mu.reg_read(UC_RISCV_REG_A0))}")

def emu_triton():
    # lui zero, 0x80000
    # mv a0, zero
    opcode = bytes.fromhex("37 00 00 80 13 05 00 00")

    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    inst = Instruction()
    inst.setOpcode(opcode)
    inst.setAddress(CODE_START)

    ctx.setConcreteRegisterValue(ctx.registers.x10, 0xffffffff80000000)

    ctx.processing(inst)

    print(f"(TT) a0 = {hex(ctx.getConcreteRegisterValue(ctx.registers.x10))}")

def main():
    emu_unicorn()
    emu_triton()

if __name__ == "__main__":
    main()

mrexodia commented 1 week ago

Here is also the updated isa-test.py that prints all the registers and showed me the problem:

import sys
from struct import pack

from triton import *
from riscv64_data import TESTS as RV64TESTS

# https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#register-convention
REG_TO_ABI = {
    "x0": "zero", # always zero
    "x1": "ra", # return address
    "x2": "sp", # stack pointer
    "x3": "gp", # global pointer
    "x4": "tp", # thread pointer
    "x5": "t0", # temporary registers
    "x6": "t1", # temporary registers
    "x7": "t2", # temporary registers
    "x8": "s0", # callee-saved registers
    "x9": "s1", # callee-saved registers
    "x10": "a0", # argument registers
    "x11": "a1", # argument registers
    "x12": "a2", # argument registers
    "x13": "a3", # argument registers
    "x14": "a4", # argument registers
    "x15": "a5", # argument registers
    "x16": "a6", # argument registers
    "x17": "a7", # argument registers
    "x18": "s2", # callee-saved registers
    "x19": "s3", # callee-saved registers
    "x20": "s4", # callee-saved registers
    "x21": "s5", # callee-saved registers
    "x22": "s6", # callee-saved registers
    "x23": "s7", # callee-saved registers
    "x24": "s8", # callee-saved registers
    "x25": "s9", # callee-saved registers
    "x26": "s10", # callee-saved registers
    "x27": "s11", # callee-saved registers
    "x28": "t3", # temporary registers
    "x29": "t4", # temporary registers
    "x30": "t5", # temporary registers
    "x31": "t6", # temporary registers
}
assert len(REG_TO_ABI) == 32
ABI_TO_REG = {v: k for k, v in REG_TO_ABI.items()}
assert len(ABI_TO_REG) == 32

def emulate_test(name: str, binary: bytes, address: int, offset: int, trace: bool):
    # initial state
    STACK = 0x200000
    istate = {
        "stack": bytearray(b"".join([pack('B', 255 - i) for i in range(256)])),
        "heap":  bytearray(b"".join([pack('B', i) for i in range(256)])),
        "x0":    0x0,
        "x1":    0x0,
        "x2":    STACK,
        "x3":    0x0,
        "x4":    0x0,
        "x5":    0x0,
        "x6":    0x0,
        "x7":    0x0,
        "x8":    0x0,
        "x9":    0x0,
        "x10":   0x0,
        "x11":   0x0,
        "x12":   0x0,
        "x13":   0x0,
        "x14":   0x0,
        "x15":   0x0,
        "x16":   0x0,
        "x17":   0x0,
        "x18":   0x0,
        "x19":   0x0,
        "x20":   0x0,
        "x21":   0x0,
        "x22":   0x0,
        "x23":   0x0,
        "x24":   0x0,
        "x25":   0x0,
        "x26":   0x0,
        "x27":   0x0,
        "x28":   0x0,
        "x29":   0x0,
        "x30":   0x0,
        "x31":   0x0,
        "f0":    0x00112233445566778899aabbccddeeff,
        "f1":    0xffeeddccbbaa99887766554433221100,
        "f2":    0xfefedcdc5656787889892692dfeccaa0,
        "f3":    0x1234567890987654321bcdffccddee01,
        "f4":    0x0,
        "f5":    0x0,
        "f6":    0x0,
        "f7":    0x0,
        "f8":    0x0,
        "f9":    0x0,
        "f10":   0x0,
        "f11":   0x0,
        "f12":   0x0,
        "f13":   0x0,
        "f14":   0x0,
        "f15":   0x0,
        "f16":   0x0,
        "f17":   0x0,
        "f18":   0x0,
        "f19":   0x0,
        "f20":   0x0,
        "f21":   0x0,
        "f22":   0x0,
        "f23":   0x0,
        "f24":   0x0,
        "f25":   0x0,
        "f26":   0x0,
        "f27":   0x0,
        "f28":   0x0,
        "f29":   0x0,
        "f30":   0x0,
        "f31":   0x0,
        "pc":    address,
    }

    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    ctx.setConcreteMemoryAreaValue(STACK,           bytes(istate['stack']))
    ctx.setConcreteMemoryAreaValue(address, binary[offset:])
    for name, value in istate.items():
        try:
            reg = getattr(ctx.registers, name)
            ctx.setConcreteRegisterValue(reg, value)
        except AttributeError:
            pass

    pc = istate['pc']
    for i in range(1000):
        ctx.setConcreteRegisterValue(ctx.registers.pc, pc)
        opcode = ctx.getConcreteMemoryValue(MemoryAccess(pc, CPUSIZE.DWORD))
        opcode_bytes = pack('<I', opcode)
        inst = Instruction(opcode_bytes)
        inst.setAddress(pc)
        state = ctx.processing(inst)
        if trace:
            disasm = inst.getDisassembly()
            tokens = [token.rstrip(",") for token in disasm.split(" ")]
            info = ""
            for op in tokens[1:]:
                if op in ABI_TO_REG:
                    op_abi = ABI_TO_REG[op]
                    reg = getattr(ctx.registers, op_abi)
                    value = ctx.getConcreteRegisterValue(reg)
                    if op == "zero" and value == 0:
                        continue
                    if len(info) > 0:
                        info += ", "
                    info += f"{op}/{op_abi}={hex(value)}"
            print(f"{hex(inst.getAddress())}|{opcode_bytes.hex(' ')}|{disasm} ({info})")
        if state == EXCEPTION.NO_FAULT:
            pc = ctx.getConcreteRegisterValue(ctx.registers.pc)
        else:
            disasm = inst.getDisassembly()
            if "fence" in disasm:
                # HACK: ignore the unsupported fence instruction
                pc += 4
            elif "ecall" in disasm:
                syscall_index = ctx.getConcreteRegisterValue(ctx.registers.x17)
                #assert syscall_index == 139, f"invalid syscall: {syscall_index}"
                return ctx.getConcreteRegisterValue(ctx.registers.x10)
            else:
                raise Exception(f"{inst} -> exception {state}")
    return -1

if __name__ == "__main__":
    success = 0
    for name, binary, address, offset in RV64TESTS:
        exit_code = emulate_test(name, binary, address, offset, trace=True)
        if exit_code == 0:
            print(f"SUCCESS: {name}")
            success += 1
        else:
            print(f"FAILURE: {name}, {hex(exit_code)}")
    print(f"\n{success}/{len(RV64TESTS)} passed")

Antwy commented 1 week ago

Hi, @mrexodia I guess that's not really me who allows x0 field in output state to be modified And in Triton x0 register instance is set to be immutable

I guess here you wanted to process triton::arch::BasicBlock instead of single instruction 'inst'. And if a0 register was manually assigned to any except the "expected" value that one would be printed. Also you can print x0 value

 def emu_triton():
    # lui zero, 0x80000
    # mv a0, zero
    opcode = bytes.fromhex("37 00 00 80 13 05 00 00")

    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    inst = Instruction()
    inst.setOpcode(opcode)
    inst.setAddress(CODE_START)

    ctx.setConcreteRegisterValue(ctx.registers.x10, 0xffffffff80000000)

    ctx.processing(inst)

    print(f"(TT) a0 = {hex(ctx.getConcreteRegisterValue(ctx.registers.x10))}")

mrexodia commented 1 week ago

Sorry, I wasn't trying to assign blame to you specifically. Just want to help get the semantics correct 🙂

Thanks for pointing out I didn't do both instructions, I adjusted the tests but it looks like x0 is getting modified anyway:

from unicorn.riscv_const import *
from unicorn import *
from triton import *

CODE_START = 0x1000

def emu_unicorn():
    # lui zero, 0x80000
    # mv a0, zero
    opcode = bytes.fromhex("37 00 00 80 13 05 00 00")

    mu = Uc(UC_ARCH_RISCV, UC_MODE_RISCV64)
    mu.mem_map(CODE_START, 0x1000)
    mu.mem_write(CODE_START, opcode)

    mu.reg_write(UC_RISCV_REG_PC, CODE_START)
    mu.emu_start(CODE_START, CODE_START + len(opcode))

    print(f"(TT) zero = {hex(ctx.getConcreteRegisterValue(ctx.registers.x0))}")
    print(f"(UC) a0 = {hex(mu.reg_read(UC_RISCV_REG_A0))}")

def emu_triton():
    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    # lui zero, 0x80000
    inst1 = Instruction(bytes.fromhex("37 00 00 80"))
    inst1.setAddress(CODE_START)
    ctx.processing(inst1)
    # mv a0, zero
    inst2 = Instruction(bytes.fromhex("13 05 00 00"))
    inst2.setAddress(CODE_START + 4)
    ctx.processing(inst2)

    print(f"(TT) a0 = {hex(ctx.getConcreteRegisterValue(ctx.registers.x10))}")

def main():
    emu_unicorn()
    emu_triton()

if __name__ == "__main__":
    main()

Prints:

(UC) a0 = 0x0
(TT) zero = 0x0
(TT) a0 = 0xffffffff80000000

mrexodia commented 1 week ago

@Antwy the bug was a copy paste error here:

https://github.com/JonathanSalwan/Triton/pull/1318/files#diff-ce47c5c76c481399837a84aee4f6d7271e1310dd6bd270e2cb1b4fdba28b2453R152

You defined MUTABLE = false for x0, but this was never passed to the triton::arch::Register (probably because it was copied from the x86 definition). After modifying this locally the ISA tests ~~go to 64/65~~ are all successful!

Antwy commented 1 week ago

Yeah, thanks for this one, but I still can reproduce non-zero x0 from your last example. The

print(f"(TT) x0 = {hex(ctx.getSymbolicRegisterValue(ctx.registers.x0))}")

gets me to

(TT) x0 = 0xffffffff80000000

mrexodia commented 1 week ago

There are two places where the MUTABLE argument was missing, did you fix both of them? Locally I ran the following:

from unicorn.riscv_const import *
from unicorn import *
from triton import *

CODE_START = 0x1000

def emu_unicorn():
    # lui zero, 0x80000
    # mv a0, zero
    opcode = bytes.fromhex("37 00 00 80 13 05 00 00")

    mu = Uc(UC_ARCH_RISCV, UC_MODE_RISCV64)
    mu.mem_map(CODE_START, 0x1000)
    mu.mem_write(CODE_START, opcode)

    mu.reg_write(UC_RISCV_REG_PC, CODE_START)
    mu.emu_start(CODE_START, CODE_START + len(opcode))

    print(f"(UC) x0 = {hex(mu.reg_read(UC_RISCV_REG_ZERO))}")
    print(f"(UC) a0 = {hex(mu.reg_read(UC_RISCV_REG_A0))}")

def emu_triton():
    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    # lui zero, 0x80000
    inst1 = Instruction(bytes.fromhex("37 00 00 80"))
    inst1.setAddress(CODE_START)
    ctx.processing(inst1)
    # mv a0, zero
    inst2 = Instruction(bytes.fromhex("13 05 00 00"))
    inst2.setAddress(CODE_START + 4)
    ctx.processing(inst2)

    print(f"(TT) x0 = {hex(ctx.getConcreteRegisterValue(ctx.registers.x0))}")
    print(f"(TT) a0 = {hex(ctx.getConcreteRegisterValue(ctx.registers.x10))}")

def main():
    emu_unicorn()
    emu_triton()

if __name__ == "__main__":
    main()

And it outputs the correct results:

(UC) x0 = 0x0
(UC) a0 = 0x0
(TT) x0 = 0x0
(TT) a0 = 0x0

I also had to rerun python setup.py install to recreate triton.so in my venv

Antwy commented 1 week ago

Fixed. Thanks for your help, @mrexodia

JonathanSalwan commented 1 week ago

Thanks a lot guys for this task force, it's nice to see.

@mrexodia, @Antwy, @m4drat everything is good on your side? Should I merge it or is it still a draft?

Antwy commented 1 week ago

Thanks a lot guys for this task force, it's nice to see.

@mrexodia, @Antwy, @m4drat everything is good on your side? Should I merge it or is it still a draft?

Nice! If it looks good to you, I think it can be merged

mrexodia commented 1 week ago

Yeah all the official ISA tests are passing on my side! The only issue is AppVeyor, but I don't think it's related to the PR?

JonathanSalwan commented 1 week ago

The only issue is AppVeyor, but I don't think it's related to the PR?

It looks like that issue is related to Capstone 5 on Windows. So what we can do is just to use Casptone 4.0 on Appveyor CI and then we add some #if CS_API_MAJOR >= 5 around riscv support on our side.

What do you think?

m4drat commented 1 week ago

@Antwy Found a problem with compressed slli

from unicorn.riscv_const import *
from unicorn import *
from triton import *

CODE_START = 0x0

def emu_unicorn():
    # slli    t6,t6,0x3c
    opcode = b"\xf2\x1f"

    mu = Uc(UC_ARCH_RISCV, UC_MODE_RISCV64)
    mu.mem_map(CODE_START, 0x1000)
    mu.mem_write(CODE_START, opcode)

    mu.reg_write(UC_RISCV_REG_T6, 0x2107FF)

    mu.reg_write(UC_RISCV_REG_PC, CODE_START)
    mu.emu_start(CODE_START, CODE_START + len(opcode))

    print("(UC) s0 = 0x%x" % mu.reg_read(UC_RISCV_REG_T6))

def emu_triton():
    # slli    t6,t6,0x3c
    opcode = b"\xf2\x1f"

    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    inst = Instruction()
    inst.setOpcode(opcode)
    inst.setAddress(CODE_START)

    ctx.setConcreteRegisterValue(ctx.registers.x31, 0x2107FF)

    ctx.processing(inst)

    print("(TT) s0 = 0x%x" % ctx.getConcreteRegisterValue(ctx.registers.x31))

def main():
    emu_unicorn()
    emu_triton()

if __name__ == "__main__":
    main()

Antwy commented 6 days ago

@Antwy Found a problem with compressed slli

Fixed.

JonathanSalwan / Triton

RISCV basic support #1318