Closed MustBastani closed 3 weeks ago
@MustBastani
Thanks for using angr+pypcode and filing this detailed bug report.
T
suffix: ARM:LE:32:v8T
. This should allow you to properly decode thumb code:In [4]: ctx = Context("ARM:LE:32:v8T")
In [5]: shellcode = b'\x2d\xe9\xf0\x41\x82\xb0\xdd\xf8\x20\x80\x06\x46\x1c\x46\x17\x46\x0d\x46\x40\x46\xa0\xf1\x3a\xf5'
In [6]: dx = ctx.disassemble(shellcode, 0x40d59375, 0, len(shellcode), 99999)
In [8]: for ins in dx.instructions:
...: print(f"{ins.addr.offset:#x}/{ins.length}: {ins.mnem} {ins.body}")
...:
0x40d59375/4: push {r4,r5,r6,r7,r8,lr}
0x40d59379/2: sub sp,#0x8
0x40d5937b/4: ldr.w r8,[sp,#0x20]
0x40d5937f/2: mov r6,r0
0x40d59381/2: mov r4,r3
0x40d59383/2: mov r7,r2
0x40d59385/2: mov r5,r1
0x40d59387/2: mov r0,r8
0x40d59389/4: bl 0x412f9e01
angr's vex lifter supports automatic thumb mode decoding, but currently the pcode lifter does not. Issue filed here: https://github.com/angr/angr/issues/4778
The CFG analysis crash you ran into is caused by something else which I've filed an issue for here: https://github.com/angr/angr/issues/4779
You can set ISA_MODE
value directly using Context::setVariableDefault
, like so:
In [2]: ctx = Context("MIPS:LE:32:default")
In [3]: ctx.setVariableDefault('ISA_MODE', 1)
In [4]: shellcode = b'\xfa\x64\xc1\x18\xc6\x28\x08\x04\x01\x72\xfb\x61\x5d\x67\x40\x1a\x19\x25\x90\xaa\x02\x67\x5d\x67\x00\xf0\x1b\x05'
In [5]: dx = ctx.disassemble(shellcode, 0x90489475, 0, len(shellcode), 99999)
In [7]: for ins in dx.instructions:
...: print(f"{ins.addr.offset:#x}/{ins.length}: {ins.mnem} {ins.body}")
...:
0x90489475/2: save 0x50,ra,s0-s1
0x90489477/4: jal 0x9098a318
0x9048947b/2: addiu a0, sp, 0x20
0x9048947d/2: cmpi v0, 0x1
0x9048947f/2: btnez 0x90489477
0x90489481/2: move v0, sp
0x90489483/4: jal 0x90489464
0x90489487/2: lhu a0, 0x20(v0)
0x90489489/2: move s0, v0
0x9048948b/2: move v0, sp
0x9048948d/4: addiu a1, sp, 0x1b
Unfortunately pypcode doesn't have an intelligent way to automatically determine what mode things should be in, as pypcode itself is a thin wrapper around SLEIGH. As you know, Ghidra handles these mode switches in it's Java based architecture extensions. Likewise, we try to handle some of this in angr, but it is not feature complete.
With the remaining issues now filed individually, I'll close this issue. If you have more problems, feel free to file another issue. Thanks again for taking the time to file this detailed bug report.
Description
I need to perform symbolic execution on a MIPS 32-bit binary which contains some MIPS16e instructions, and pypcode fails to disassemble/translate the binary. Similarly, it fails to disassemble a binary with ARM Thumb instructions (Both ISAs have instructions with alignment 2). While,
UberEngine
with VEX IR successfully disassembles the ARM binary.Steps to reproduce the bug
UberEngine
: --> SuccessfulUberEnginePcode
:The first four bytes are considered as an ARM instruction, but the disassembly failed at the second 4 bytes. I couldn’t find any APIs in both of pypcode and its C++ backend to set instruction alignment or change the
ISA_MODE
register. Also, I tried hacking pypcode by manuallyARMTHUMBinstructions
https://github.com/angr/pypcode/blob/da0cff97026092759fda72113f83807f230b13b1/pypcode/processors/ARM/data/languages/ARM.sinc#L297-L302but it still failed to disassemble by throwing
BadDataError
. https://github.com/angr/pypcode/blob/da0cff97026092759fda72113f83807f230b13b1/pypcode/sleigh/slghsymbol.cc#L2293-L2294Environment
I used
angr 9.2.80.dev0
,archinfo 9.2.80.dev0
, andpypcode 3.0.3.dev0
for all the experiments.Modifications: I used a modified version of Ghidra 11.1.1 to disassemble the above MIPS shellcode/binary (example 4), and here is the result (I also modified the pypcode MIPS processor accordingly):
You may receive a different error trying the example 2. angr has some minor bugs when using
UberEnginePcode
which I fixed as follow:Additional context
I am not sure if it was a good idea to put all this information in one issue 😕. Also, I'm willing to work on this issue. I just don't know what is the actual problem yet. I can provide additional information about the binary via email.