angr / angr

A powerful and user-friendly binary analysis platform!
http://angr.io
BSD 2-Clause "Simplified" License
7.36k stars 1.06k forks source link

wrong jumpkind for arm #307

Closed 5lipper closed 5 years ago

5lipper commented 7 years ago

Sometimes simuvex can not detect the correct jumpkind for arm binary. @ltfish @zardus

A small case:

In [23]: b = p.factory.block(0x300D14, insn_bytes='20509de524109de568809fe51ce09de5a7ffffea'.decode('hex'))

In [24]: b.vex.jumpkind
Out[24]: 'Ijk_Call'

In [25]: b.pp()
0x300d14:   ldr r5, [sp, #0x20]
0x300d18:   ldr r1, [sp, #0x24]
0x300d1c:   ldr r8, [pc, #0x68]
0x300d20:   ldr lr, [sp, #0x1c]
0x300d24:   b   #0x300bc8

The jumpkind should be Ijk_Boring.

A large case:

In [26]: b = p.factory.block(addr=0x232dc8, insn_bytes='0000c1e5080d97e50400d0e5781b92e50010d1e5000021e0781b92e50000c1e5080d97e50500d0e57c1b92e50010d1e500
    ...: 0021e07c1b92e50000c1e5080d97e50600d0e5801b92e50010d1e5000021e0801b92e50000c1e5080d97e50700d0e5841b92e50010d1e5000021e0841b92e50000c1e5080d97e5080
    ...: 0d0e5881b92e50010d1e5000021e0881b92e50000c1e5080d97e50900d0e58c1b92e50010d1e5000021e08c1b92e50000c1e5080d97e50a00d0e5901b92e50010d1e5000021e0901b
    ...: 92e50000c1e5080d97e50b00d0e5941b92e50010d1e5000021e0941b92e50000c1e5080d97e50c00d0e5981b92e50010d1e5000021e0981b92e50000c1e5080d97e50d00d0e59c1b9
    ...: 2e50010d1e5000021e09c1b92e50000c1e5080d97e50e00d0e5a01b92e50010d1e5000021e0a01b92e50000c1e5080d97e50f00d0e5a41b92e50010d1e5000021e0a41b92e50000c1
    ...: e5fc0c97e5100080e2ac0b82e5040d97e5100080e2b00b82e5000d97e5103080e2d0378ee5800b9fe5806b9fe5281b92e503ea8de2010053e1'.decode('hex'))

In [27]: b.vex.jumpkind
Out[27]: 'Ijk_Call'

In [28]: b.pp()
0x232dc8:   strb    r0, [r1]
0x232dcc:   ldr r0, [r7, #0xd08]
0x232dd0:   ldrb    r0, [r0, #4]
0x232dd4:   ldr r1, [r2, #0xb78]
0x232dd8:   ldrb    r1, [r1]
0x232ddc:   eor r0, r1, r0
0x232de0:   ldr r1, [r2, #0xb78]
0x232de4:   strb    r0, [r1]
0x232de8:   ldr r0, [r7, #0xd08]
0x232dec:   ldrb    r0, [r0, #5]
0x232df0:   ldr r1, [r2, #0xb7c]
0x232df4:   ldrb    r1, [r1]
0x232df8:   eor r0, r1, r0
0x232dfc:   ldr r1, [r2, #0xb7c]
0x232e00:   strb    r0, [r1]
0x232e04:   ldr r0, [r7, #0xd08]
0x232e08:   ldrb    r0, [r0, #6]
0x232e0c:   ldr r1, [r2, #0xb80]
0x232e10:   ldrb    r1, [r1]
0x232e14:   eor r0, r1, r0
0x232e18:   ldr r1, [r2, #0xb80]
0x232e1c:   strb    r0, [r1]
0x232e20:   ldr r0, [r7, #0xd08]
0x232e24:   ldrb    r0, [r0, #7]
0x232e28:   ldr r1, [r2, #0xb84]
0x232e2c:   ldrb    r1, [r1]
0x232e30:   eor r0, r1, r0
0x232e34:   ldr r1, [r2, #0xb84]
0x232e38:   strb    r0, [r1]
0x232e3c:   ldr r0, [r7, #0xd08]
0x232e40:   ldrb    r0, [r0, #8]
0x232e44:   ldr r1, [r2, #0xb88]
0x232e48:   ldrb    r1, [r1]
0x232e4c:   eor r0, r1, r0
0x232e50:   ldr r1, [r2, #0xb88]
0x232e54:   strb    r0, [r1]
0x232e58:   ldr r0, [r7, #0xd08]
0x232e5c:   ldrb    r0, [r0, #9]
0x232e60:   ldr r1, [r2, #0xb8c]
0x232e64:   ldrb    r1, [r1]
0x232e68:   eor r0, r1, r0
0x232e6c:   ldr r1, [r2, #0xb8c]
0x232e70:   strb    r0, [r1]
0x232e74:   ldr r0, [r7, #0xd08]
0x232e78:   ldrb    r0, [r0, #0xa]
0x232e7c:   ldr r1, [r2, #0xb90]
0x232e80:   ldrb    r1, [r1]
0x232e84:   eor r0, r1, r0
0x232e88:   ldr r1, [r2, #0xb90]
0x232e8c:   strb    r0, [r1]
0x232e90:   ldr r0, [r7, #0xd08]
0x232e94:   ldrb    r0, [r0, #0xb]
0x232e98:   ldr r1, [r2, #0xb94]
0x232e9c:   ldrb    r1, [r1]
0x232ea0:   eor r0, r1, r0
0x232ea4:   ldr r1, [r2, #0xb94]
0x232ea8:   strb    r0, [r1]
0x232eac:   ldr r0, [r7, #0xd08]
0x232eb0:   ldrb    r0, [r0, #0xc]
0x232eb4:   ldr r1, [r2, #0xb98]
0x232eb8:   ldrb    r1, [r1]
0x232ebc:   eor r0, r1, r0
0x232ec0:   ldr r1, [r2, #0xb98]
0x232ec4:   strb    r0, [r1]
0x232ec8:   ldr r0, [r7, #0xd08]
0x232ecc:   ldrb    r0, [r0, #0xd]
0x232ed0:   ldr r1, [r2, #0xb9c]
0x232ed4:   ldrb    r1, [r1]
0x232ed8:   eor r0, r1, r0
0x232edc:   ldr r1, [r2, #0xb9c]
0x232ee0:   strb    r0, [r1]
0x232ee4:   ldr r0, [r7, #0xd08]
0x232ee8:   ldrb    r0, [r0, #0xe]
0x232eec:   ldr r1, [r2, #0xba0]
0x232ef0:   ldrb    r1, [r1]
0x232ef4:   eor r0, r1, r0
0x232ef8:   ldr r1, [r2, #0xba0]
0x232efc:   strb    r0, [r1]
0x232f00:   ldr r0, [r7, #0xd08]
0x232f04:   ldrb    r0, [r0, #0xf]
0x232f08:   ldr r1, [r2, #0xba4]
0x232f0c:   ldrb    r1, [r1]
0x232f10:   eor r0, r1, r0
0x232f14:   ldr r1, [r2, #0xba4]
0x232f18:   strb    r0, [r1]
0x232f1c:   ldr r0, [r7, #0xcfc]
0x232f20:   add r0, r0, #0x10
0x232f24:   str r0, [r2, #0xbac]
0x232f28:   ldr r0, [r7, #0xd04]
0x232f2c:   add r0, r0, #0x10
0x232f30:   str r0, [r2, #0xbb0]
0x232f34:   ldr r0, [r7, #0xd00]
0x232f38:   add r3, r0, #0x10
0x232f3c:   str r3, [lr, #0x7d0]
0x232f40:   ldr r0, [pc, #0xb80]
0x232f44:   ldr r6, [pc, #0xb80]
0x232f48:   ldr r1, [r2, #0xb28]
0x232f4c:   add lr, sp, #0x3000
0x232f50:   cmp r3, r1

This is a oversized basic block, it's not ended with any branching instruction. The correct jumpkind should be Ijk_Boring.

ltfish commented 7 years ago

Reproduced. Should be a bug (or a feature?) in libVEX.

grant-h commented 7 years ago

Looks like by writing to lr, the jumpkind is being set to call. Here's the small case without the lr write:

In [5]: proj2.factory.block(0x300D14, insn_bytes='20509de524109de568809fe5a7ffffea'.decode('hex')).vex.pp()
IRSB {
   t0:Ity_I32 t1:Ity_I32 t2:Ity_I32 t3:Ity_I32 t4:Ity_I32 t5:Ity_I32 t6:Ity_I32 t7:Ity_I32 t8:Ity_I32 t9:Ity_I32 t10:Ity_I32 t11:Ity_I32 t12:Ity_I32 t13:Ity_I32

   00 | ------ IMark(0x300d14, 4, 0) ------
   01 | t10 = GET:I32(r13)
   02 | t9 = Add32(t10,0x00000020)
   03 | t2 = LDle:I32(t9)
   04 | PUT(r5) = t2
   05 | PUT(ip) = 0x00300d18
   06 | ------ IMark(0x300d18, 4, 0) ------
   07 | t11 = Add32(t10,0x00000024)
   08 | t5 = LDle:I32(t11)
   09 | PUT(r1) = t5
   10 | PUT(ip) = 0x00300d1c
   11 | ------ IMark(0x300d1c, 4, 0) ------
   12 | t8 = LDle:I32(0x00300d8c)
   13 | PUT(r8) = t8
   14 | ------ IMark(0x300d20, 4, 0) ------
   NEXT: PUT(pc) = 0x00300bc4; Ijk_Boring
}

In [6]: proj2.factory.block(0x300D14, insn_bytes='20509de524109de568809fe5a7ffffea'.decode('hex')).pp()
0x300d14:       ldr     r5, [sp, #0x20]
0x300d18:       ldr     r1, [sp, #0x24]
0x300d1c:       ldr     r8, [pc, #0x68]
0x300d20:       b       #0x300bc4

But the plot thickens... When adding traceflags to the same statement, VEX prints report that the jump kind is boring not call:

In [9]: proj2.factory.block(0x300D14, insn_bytes='20509de524109de568809fe51ce09de5a7ffffea'.decode('hex'), traceflags=255).vex.pp()
<snip>
IRSB {
   t0:I32   t1:I32   t2:I32   t3:I32   t4:I32   t5:I32   t6:I32   t7:I32
   t8:I32   t9:I32   t10:I32   t11:I32   t12:I32   t13:I32   t14:I32   t15:I32
   t16:I32   t17:I32   t18:I32

   ------ IMark(0x300d14, 4, 0) ------
   t13 = GET:I32(60)
   t12 = Add32(t13,0x20:I32)
   t2 = LDle:I32(t12)
   PUT(28) = t2
   PUT(68) = 0x300d18:I32
   ------ IMark(0x300d18, 4, 0) ------
   t14 = Add32(t13,0x24:I32)
   t5 = LDle:I32(t14)
   PUT(12) = t5
   PUT(68) = 0x300d1c:I32
   ------ IMark(0x300d1c, 4, 0) ------
   t8 = LDle:I32(0x300d8c:I32)
   PUT(40) = t8
   PUT(68) = 0x300d20:I32
   ------ IMark(0x300d20, 4, 0) ------
   t16 = Add32(t13,0x1c:I32)
   t11 = LDle:I32(t16)
   PUT(64) = t11
   ------ IMark(0x300d24, 4, 0) ------
   PUT(68) = 0x300bc8:I32; exit-Boring
}

IRSB {
   t0:Ity_I32 t1:Ity_I32 t2:Ity_I32 t3:Ity_I32 t4:Ity_I32 t5:Ity_I32 t6:Ity_I32 t7:Ity_I32 t8:Ity_I32 t9:Ity_I32 t10:Ity_I32 t11:Ity_I32 t12:Ity_I32 t13:Ity_I32 t14:Ity_I32 t15:Ity_I32 t16:Ity_I32 t17:Ity_I32 t18:Ity_I32

   00 | ------ IMark(0x300d14, 4, 0) ------
   01 | t13 = GET:I32(r13)
   02 | t12 = Add32(t13,0x00000020)
   03 | t2 = LDle:I32(t12)
   04 | PUT(r5) = t2
   05 | PUT(ip) = 0x00300d18
   06 | ------ IMark(0x300d18, 4, 0) ------
   07 | t14 = Add32(t13,0x00000024)
   08 | t5 = LDle:I32(t14)
   09 | PUT(r1) = t5
   10 | PUT(ip) = 0x00300d1c
   11 | ------ IMark(0x300d1c, 4, 0) ------
   12 | t8 = LDle:I32(0x00300d8c)
   13 | PUT(r8) = t8
   14 | PUT(ip) = 0x00300d20
   15 | ------ IMark(0x300d20, 4, 0) ------
   16 | t16 = Add32(t13,0x0000001c)
   17 | t11 = LDle:I32(t16)
   18 | PUT(lr) = t11
   19 | ------ IMark(0x300d24, 4, 0) ------
   NEXT: PUT(pc) = 0x00300bc8; Ijk_Call
}

The top block is from VEX and the bottom from Angr. Notice the JK mismatch. Upon further digging, I've tracked the bug to Angr in the old angr/lifter.py ARM post processor: https://github.com/angr/angr/blob/93ba3fac303fdf676b40ce1b7ad4a71b290e5fdb/angr/lifter.py#L274-L299 After the SimEngine merge, I don't believe these post processors exists any more, so the bug may be gone! I haven't confirmed as my checkout is from August....

rhelmot commented 7 years ago

The engines merge refactored the lifter stuff and moved the postprocessor to pyvex/lift/fixes.py, lol

This code was ostensably written to fix some bug we found in libvex where an actual call wasn't being marked as a call and everything went to hell. If only we had that testcase lying around so we could write a better postprocessor...

subwire commented 5 years ago

This issue was already fixed