angr / vex

A patched version of VEX to work with PyVEX.
GNU General Public License v2.0
104 stars 43 forks source link

API Calls representation in VEX #52

Open nav60 opened 1 year ago

nav60 commented 1 year ago

Question

Hi, I need some guidance regarding VEX IR for API calls of ELF files for MIPS/ARM binaries. I have seen VEX IR for different assembly instructions however unable to understand it for functions calls in VEX representation.

rhelmot commented 1 year ago

VEX models function calls at the assembly level - there is no distinction between a call and a jump that happens to set a return address. However, an attempt is made to identify calls and mark their jumpkinds as Ijk_Call.

nav60 commented 1 year ago

can you guide me on this if possible? I have seen "Ijk_Call" against "nop" operation of MIPS assembly.

nav60 commented 1 year ago

Sir, I have another query if you can guide me on this. I am trying to link basic block extracted using IDAPRO to IRSB but unable to relate because IRSB includes those addresses which IDAPRO shows independent. Like IRSB consist of multiple IMARK statements and i cannot distinguish it with respect to IDAPython output. I am working on MIPS/ARM binaries yet.

rhelmot commented 1 year ago

The correct honorific for me is "ma'am" :)

For your first problem, can you show me the basic block which has nop given an Ijk_Call? It's possible it is just a mips delay slot.

I am having trouble understanding your second problem. Can you post what data you're seeing and what conclusions you would like to be able to draw from them, and I can help you see how to draw them?

nav60 commented 1 year ago

I am sorry Ma'am. Ma'am, Let me try to explain it. I have a MIPS ELF binary and i am trying to see the assembly and their IRSB blocks as shown below. So in this case address is "0x400594" and its assembly and VEX code is shown below and you can see the " NEXT: PUT(pc) = 0x004005b8; Ijk_Call" which is against the "nop" assembly as my understanding against the 4005b4 address. Hope you got my first part. However, as i have not API call example at this time for MIPS platform.

--------ADDR----- 0x400594 -----ASSEMBLY--------- _init: 400594 lui $gp, 0x5 400598 addiu $gp, $gp, -0x7044 40059c addu $gp, $gp, $t9 4005a0 addiu $sp, $sp, -0x20 4005a4 sw $gp, [$sp] 4005a8 sw $ra, [$sp] 4005ac sw $gp, [$sp] 4005b0 bal sub_4005b8 4005b4 nop
None ------vex---- IRSB {

00 | ------ IMark(0x400594, 4, 0) ------ 01 | ------ IMark(0x400598, 4, 0) ------ 02 | ------ IMark(0x40059c, 4, 0) ------ 03 | t10 = GET:I32(t9) 04 | t9 = Add32(0x00048fbc,t10) 05 | PUT(gp) = t9 06 | ------ IMark(0x4005a0, 4, 0) ------ 07 | t13 = GET:I32(sp) 08 | t12 = Add32(t13,0xffffffe0) 09 | PUT(sp) = t12 10 | PUT(pc) = 0x004005a4 11 | ------ IMark(0x4005a4, 4, 0) ------ 12 | t14 = Add32(t12,0x00000010) 13 | STle(t14) = t9 14 | PUT(pc) = 0x004005a8 15 | ------ IMark(0x4005a8, 4, 0) ------ 16 | t17 = Add32(t12,0x0000001c) 17 | t19 = GET:I32(ra) 18 | STle(t17) = t19 19 | PUT(pc) = 0x004005ac 20 | ------ IMark(0x4005ac, 4, 0) ------ 21 | t20 = Add32(t12,0x00000018) 22 | STle(t20) = t9 23 | ------ IMark(0x4005b0, 4, 0) ------ 24 | PUT(ra) = 0x004005b8 25 | PUT(pc) = 0x004005b4 26 | ------ IMark(0x4005b4, 4, 0) ------ NEXT: PUT(pc) = 0x004005b8; Ijk_Call }

rhelmot commented 1 year ago

MIPS has a concept called "branch delay slots", which means that most control flow instructions will not immediately cause the end of a basic block and a control flow transfer, but rather, the control flow will take effect after one additional instruction has been executed. You are seeing nop with Ijk_Call, but it's more accurate to say you're seeing bal, then nop, then Ijk_Call. The bal is a branch-and-link instruction, or a call.

nav60 commented 1 year ago

here is the explanation of 2nd part of my question. I have a IDAPython script to produce following, traversing each BB of function and getting instruction on it as shown below. It is same example which i posted above for VEX case. So in the above case i want to see the VEX instruction for each basic block and include the "0x4005b4" which is actually 2nd block in my this output. But above VEX output include it in the first IRSB and i am not able to distinguish it. actual goal is to map IDAPython output and VEX against the BB of the functions. i have tried to explain it however sorry in case of more confusion.

  1. Addr: 0x400594 Instruction Insts: li;addu;addiu;sw;sw;sw;bal

  2. BB Addr: 0x4005b4 Instruction Insts: nop

rhelmot commented 1 year ago

I believe this is explained by a difference between ida and angr's conception of whether branch delay slots belong as parts of the basic blocks they terminate. How are you generating the instruction -> block mapping in ida?

nav60 commented 1 year ago

Hmm. well, with the help of online IDAPython code to check BB's in each function of binary and then checking checking instructions in each BB. Thanks for your explanation.