ARMv8 Neon Intrinsics not supported

soyasoya5 commented 1 year ago

from triton import *

if __name__ == "__main__":
    ctx = TritonContext(ARCH.AARCH64)

    ctx.setMode(MODE.ALIGNED_MEMORY, True)
    ctx.setMode(MODE.CONSTANT_FOLDING, True)

    code = [
        (0x1000, b"\x00\x1c\x29\x2e"),  # EOR V0.8B, V0.8B, V9.8B
        (0x1004, b"\x00\x1c\x21\x2e"),  # EOR V0.8B, V0.8B, V1.8B
        (0x1008, b"\x21\x1c\x20\x6e"),  # EOR V1.16B, V1.16B, V0.16B
        (0x100C, b"\x42\x1c\x29\x2e"),  # EOR V2.16B, V2.16B, V0.16B
        (0x1010, b"\x63\x1c\x29\x2e"),  # EOR V3.16B, V3.16B, V0.16B

    ]

    for pc, op in code:
        inst = Instruction(pc, op)
        ctx.disassembly(inst)
        print(inst.getDisassembly())
        ctx.processing(inst)  # throws TypeError: AArch64Cpu::getConcreteRegisterValue(): Invalid register.

JonathanSalwan commented 1 year ago

Thanks for this snippet of code. I will try to support those registers as soon as possible.

JonathanSalwan commented 1 year ago

It will take some time to implement the whole Neon support. I've pushed the registers definition but now we need to support other instructions like (LD1, LD2, etc) and operands decoding (e.g: V0.8B, 4S, 2D etc.).

Note to myself:

JonathanSalwan commented 1 year ago

@soyasoya5, the EOR instructions is now working on vector registers. Let's close this thread if it's working for you. Then, I will open new threads for every new Neon instructions added.

soyasoya5 commented 1 year ago

@JonathanSalwan thanks for the quick support! I tried it out on a small example and the semantics for EOR seems to be correct, However the register implementation might be incomplete/wrong.

The V/B/H/S/D/Q registers are actually different views to the same register. We shouldn't need to write additional code to copy the data.

I've attached a code snippet below which illustrates this:

from triton import *

if __name__ == "__main__":
    ctx = TritonContext(ARCH.AARCH64)

    ctx.setMode(MODE.ALIGNED_MEMORY, True)
    ctx.setAstRepresentationMode(AST_REPRESENTATION.PCODE)

    # setup memory
    ctx.setConcreteMemoryAreaValue(0x129098, 0xF69078DEB08D5F08.to_bytes(length=8, byteorder='little'))
    ctx.setConcreteMemoryAreaValue(0x1290a0, 0x939027DCB2D0494B.to_bytes(length=8, byteorder='little'))
    ctx.setConcreteMemoryAreaValue(0x1290a8, b"\x01")
    ctx.setConcreteMemoryAreaValue(0xa7090, b"\x27\x2f\xff\xdf\xbd\x57\xe3\x93\x27\x2f\xff\xdf\xbd\x57\xe3\x93")

    setup = [
        (0x40918, b"\x40\x07\x00\xB0"),   # adrp x0, #0x129000
        (0x4091C, b"\x00\x60\x02\x91"),   # add  x0, x0, #0x98
    ]
    for pc, op in setup:
        inst = Instruction(pc, op)
        ctx.processing(inst)

    code = [
        b"\x08\x40\x40\x39",#                             LDRB            W8, [X0,#0x10]
        b"\xe8\x00\x00\x34",#                             CBZ             W8, locret_4840C
        b"\xE8\x02\x00\xF0",#                             ADRP            X8, #xmmword_A7090@PAGE
        b"\x00\x00\xC0\x3D",#                             LDR             Q0, [X0]
        b"\x01\x25\xC0\x3D",#                             LDR             Q1, [X8,#xmmword_A7090@PAGEOFF]
        b"\x1F\x40\x00\x39",#                             STRB            WZR, [X0,#0x10]
        b"\x00\x1C\x21\x6E",#                             EOR             V0.16B, V0.16B, V1.16B
        b"\x00\x00\x80\x3D",#                             STR             Q0, [X0]
    ]

    pc = 0x483ec
    # pc = 0x483f4
    for op in code:
        inst = Instruction(pc, op)
        ctx.processing(inst)
        print(f"{hex(pc)}: {inst.getDisassembly()}")

        # Uncomment to make this code work
        # if pc == 0x48400: 
        #   q0 = ctx.getConcreteRegisterValue(ctx.registers.q0)
        #   q1 = ctx.getConcreteRegisterValue(ctx.registers.q1)
        #   ctx.setConcreteRegisterValue(ctx.registers.v0, q0)
        #   ctx.setConcreteRegisterValue(ctx.registers.v1, q1)
        # elif pc == 0x48404:
        #   [print(f"\t{x}") for x in inst.getSymbolicExpressions()]
        #   v0 = ctx.getConcreteRegisterValue(ctx.registers.v0)
        #   q0 = ctx.setConcreteRegisterValue(ctx.registers.q0, v0)

        pc = ctx.getConcreteRegisterValue(ctx.registers.pc)

    print(ctx.getConcreteMemoryAreaValue(0x129098, 16))
    assert(ctx.getConcreteMemoryAreaValue(0x1290a8, 1) == b'\x00')
    assert(ctx.getConcreteMemoryAreaValue(0x129098, 16) == b'/proc/self/maps\x00') # AssertionError if code is commented

    # q0 = 0x939027dcb2d0494bf69078deb08d5f08
    # q1 = 0x93e357bddfff2f2793e357bddfff2f
    # assert (q0^q1).to_bytes(16, byteorder='little') == b'/proc/self/maps\x00'

Reference:

JonathanSalwan commented 1 year ago

Erf, I knew that B/H/S/D/Q registers are different views of the same register but didn't know about V. Thanks for the info =). Below the patch. Now I got this with your last snippet of code:

$ python a.py
0x483ec: ldrb w8, [x0, #0x10]
0x483f0: cbz w8, #0x4840c
0x483f4: adrp x8, #0xa7000
0x483f8: ldr q0, [x0]
0x483fc: ldr q1, [x8, #0x90]
0x48400: strb wzr, [x0, #0x10]
0x48404: eor v0.16b, v0.16b, v1.16b
0x48408: str q0, [x0]
b'/proc/self/maps\x00'

PS: Note that logical instructions like EOR is working out of the box on vector registers but arithmetical instructions like ADD does not work on V registers yet. I will try to implement them in a near future :)

soyasoya5 commented 1 year ago

Awesome! I think we can close this issue then. Thank you! 😄

JonathanSalwan / Triton

ARMv8 Neon Intrinsics not supported #1193