angr / angr

A powerful and user-friendly binary analysis platform!
http://angr.io
BSD 2-Clause "Simplified" License
7.41k stars 1.07k forks source link

A question about angr Vex IR #4249

Open RuilingZ opened 9 months ago

RuilingZ commented 9 months ago

Question

I have a question regarding the IR code for the PowerPC architecture. In the experiment, I wanted to analyze the following the assembly instruction:

10000508: bl 10000640 puts@plt

I used angr to lift the assembly instruction to IR, and the lifted IR code constrains only one statements, as follows:

"------IMark(0x10000508,4,0)------",
"PUT(offset=1172)=0x1000050c"`

This transformation appears very strange. The original assembly instruction is a function call invocation, yet the semantics of the invocation do not seem to be accurately reflected in the IR code. Specifically, it does not include the actual call to the function “puts”. Instead, the only information present in the IR code is the storage of the return address “0x1000050c” (the address of the instruction immediately following the function call invocation).

We are wondering whether there is any step that we missed? Could you kindly provide us with some advises on how to accurately convert PowerPC binary code into IR?

rhelmot commented 9 months ago

VEX lifts whole blocks, not individual instructions. If you look at the rest of the block, you'll see that it includes a "default exit" directive, which is where control flow should transfer after the statements terminate normally. This encodes the call target, so the block encodes the two effects of a call statement, the link register and the program counter.

RuilingZ commented 9 months ago

Sorry to bother you again. I've encountered an issue where I'm unable to identify the function call within the IR generated from either ARM or MIPS. I have created a simple example function where a main function calls a foo function. The foo function performs no operations.

Steps to Reproduce

  1. Wrote a simple C program with a main and foo function.
  2. Compiled the program to binary on ARM/MIPS architecture.
  3. Generated the IR based on the binary file using angr.

I expected to be able to locate the function call to foo within the generated IR. However, the function call to foo is not identifiable or perhaps I am missing the correct representation of it in the IR.

Here's the source code, assembly code, and the generated IR code for reference:

Source Code (C)

#include<stdio.h>
void foo(){}
void main(){
    foo();
}

Assembly Code (ARM)

000005a4 <foo>:
 5a4:   b480        push    {r7}
 5a6:   af00        add r7, sp, #0
 5a8:   bf00        nop
 5aa:   46bd        mov sp, r7
 5ac:   f85d 7b04   ldr.w   r7, [sp], #4
 5b0:   4770        bx  lr

000005b2 <main>:
 5b2:   b580        push    {r7, lr}
 5b4:   af00        add r7, sp, #0
 5b6:   f7ff fff5   bl  5a4 <foo>
 5ba:   bf00        nop
 5bc:   bd80        pop {r7, pc}

IR Code(ARM)

"main": [
        [
            "------IMark(0x4005b2,2,1)------",
            "t0=GET:I32(offset=392)",
            "t1=Shr32(t0,0x08)",
            "PUT(offset=392)=t1",
            "t23=And32(t0,0x000000f0)",
            "t22=Xor32(t23,0x000000e0)",
            "t24=GET:I32(offset=72)",
            "t21=Or32(t24,t22)",
            "t25=GET:I32(offset=76)",
            "t26=GET:I32(offset=80)",
            "t27=GET:I32(offset=84)",
            "t28=armg_calculate_condition(t21,t25,t26,t27):Ity_I32",
            "t30=CmpNE32(t23,0x00000000)",
            "t29=ITE(t30,t28,0x00000001)",
            "t35=32to1(t29)",
            "t34=Not1(t35)",
            "if(t34){PUT(offset=68)=0x4005b5;Ijk_Boring}",
            "t37=GET:I32(offset=60)",
            "t36=Sub32(t37,0x00000008)",
            "PUT(offset=60)=t36",
            "t7=And32(t36,0xfffffffc)",
            "t39=GET:I32(offset=36)",
            "STle(t7)=t39",
            "t40=Add32(t7,0x00000004)",
            "t41=GET:I32(offset=64)",
            "STle(t40)=t41",
            "------IMark(0x4005b4,2,1)------",
            "t9=Shr32(t1,0x08)",
            "t44=And32(t1,0x000000f0)",
            "t43=Xor32(t44,0x000000e0)",
            "t42=Or32(t24,t43)",
            "t49=armg_calculate_condition(t42,t25,t26,t27):Ity_I32",
            "t51=CmpNE32(t44,0x00000000)",
            "t50=ITE(t51,t49,0x00000001)",
            "t59=CmpNE32(t50,0x00000000)",
            "t55=ITE(t59,t36,t39)",
            "PUT(offset=36)=t55",
            "PUT(offset=68)=0x004005b7",
            "------IMark(0x4005b6,4,1)------",
            "t15=Shr32(t9,0x08)",
            "t62=And32(t9,0x000000f0)",
            "t61=Xor32(t62,0x000000e0)",
            "t60=Or32(t24,t61)",
            "t67=armg_calculate_condition(t60,t25,t26,t27):Ity_I32",
            "t69=CmpNE32(t62,0x00000000)",
            "t68=ITE(t69,t67,0x00000001)",
            "PUT(offset=392)=t9",
            "t73=CmpNE32(t15,0x00000000)",
            "if(t73){PUT(offset=68)=0x4005b7;Ijk_NoDecode}",
            "PUT(offset=392)=t15",
            "t75=32to1(t68)",
            "t74=Not1(t75)",
            "if(t74){PUT(offset=68)=0x4005bb;Ijk_Boring}",
            "PUT(offset=64)=0x004005bb"
        ],

The specific instruction in question is the call to the foo function: 5b6: f7ff fff5 bl 5a4 <foo>. Based on the address, I identified the related IR block which begins with "------IMark(0x4005b6,4,1)------". I guess this block represents a call to the foo function, but I am unable to figure out how the function call is implemented within this block.

Another problem is that this IR block doesn't have the address of the foo function in it, which is supposed to be 0x103e8. I would greatly appreciate it if you could take a look at this issue.

Assembly Code (MIPS)

000007c0 <foo>:
 7c0:   27bdfff8    addiu   sp,sp,-8
 7c4:   afbe0004    sw  s8,4(sp)
 7c8:   03a0f025    move    s8,sp
 7cc:   00000000    nop
 7d0:   03c0e825    move    sp,s8
 7d4:   8fbe0004    lw  s8,4(sp)
 7d8:   27bd0008    addiu   sp,sp,8
 7dc:   03e00008    jr  ra
 7e0:   00000000    nop

000007e4 <main>:
 7e4:   3c1c0002    lui gp,0x2
 7e8:   279c817c    addiu   gp,gp,-32388
 7ec:   0399e021    addu    gp,gp,t9
 7f0:           27bdffe0    addiu   sp,sp,-32
 7f4:           afbf001c    sw          ra,28(sp)
 7f8:           afbe0018    sw          s8,24(sp)
 7fc:           03a0f025    move    s8,sp
 800:   afbc0010    sw          gp,16(sp)
 804:   8f828034    lw          v0,-32716(gp)
 808:   0040c825    move    t9,v0
 80c:   0411ffec    bal 7c0 <foo>
 810:   00000000    nop
 814:   8fdc0010    lw  gp,16(s8)
 818:   00000000    nop
 81c:   03c0e825    move    sp,s8
 820:   8fbf001c    lw  ra,28(sp)
 824:   8fbe0018    lw  s8,24(sp)
 828:   27bd0020    addiu   sp,sp,32
 82c:   03e00008    jr  ra
 830:   00000000    nop

IR Code(MIPS)

"main": [
        [
            "------IMark(0x4007e4,4,0)------",
            "------IMark(0x4007e8,4,0)------",
            "------IMark(0x4007ec,4,0)------",
            "t11=GET:I32(offset=108)",
            "t10=Add32(0x0001817c,t11)",
            "PUT(offset=120)=t10",
            "------IMark(0x4007f0,4,0)------",
            "t14=GET:I32(offset=124)",
            "t13=Add32(t14,0xffffffe0)",
            "PUT(offset=124)=t13",
            "PUT(offset=136)=0x004007f4",
            "------IMark(0x4007f4,4,0)------",
            "t15=Add32(t13,0x0000001c)",
            "t17=GET:I32(offset=132)",
            "STbe(t15)=t17",
            "PUT(offset=136)=0x004007f8",
            "------IMark(0x4007f8,4,0)------",
            "t18=Add32(t13,0x00000018)",
            "t20=GET:I32(offset=128)",
            "STbe(t18)=t20",
            "------IMark(0x4007fc,4,0)------",
            "PUT(offset=128)=t13",
            "PUT(offset=136)=0x00400800",
            "------IMark(0x400800,4,0)------",
            "t23=Add32(t13,0x00000010)",
            "STbe(t23)=t10",
            "PUT(offset=136)=0x00400804",
            "------IMark(0x400804,4,0)------",
            "t26=Add32(t10,0xffff8034)",
            "t28=LDbe:I32(t26)",
            "PUT(offset=16)=t28",
            "------IMark(0x400808,4,0)------",
            "PUT(offset=108)=t28",
            "------IMark(0x40080c,4,0)------",
            "PUT(offset=132)=0x00400814",
            "PUT(offset=136)=0x00400810",
            "------IMark(0x400810,4,0)------"
        ],

Similar to ARM, The specific instruction is:80c:0411ffec bal 7c0 <foo>. The related IR block is supposed to begin with "------IMark(0x40080c,4,0)------". But I can't figure out how this block below represents a function call.

            "------IMark(0x40080c,4,0)------",
            "PUT(offset=132)=0x00400814",
            "PUT(offset=136)=0x00400810",

Below is the Python code I used to generate IR from binary, for your reference.

def process_binary(load_path, save_path):
    function_names = []
    proj = angr.Project(load_path, load_options={'auto_load_libs': False})
    cfg = proj.analyses.CFGFast()
    this_file_IR={}
    for func in cfg.kb.functions.values():
        this_func_IR = []
        if func.name not in function_names:
            function_names.append(func.name)   
            for block in func.blocks:
                this_block_IR = []
                if block.size > 0:
                    irsb = block.vex
                    for stmt in irsb.statements:
                        this_block_IR.append(str(stmt).replace(" ",""))
                this_func_IR.append(this_block_IR)
            this_file_IR[func.name] = this_func_IR
    file_name = os.path.basename(load_path)
    save2json(this_file_IR, save_path)

I would be very grateful if you could help me identify which part of the IR corresponds to the function call. Many thanks!

rhelmot commented 9 months ago

Again, you can't just look at statements in order to get an accurate picture of what basic blocks do. You have to use the whole block object, or at least the default exit expression and jumpkind, in order to see the full effects. In both of these cases, the jump target of the call instruction is encoded as the default exit expression of the block.

RuilingZ commented 9 months ago

Thank you for your clarification, @rhelmot.

Could you guide me on how to modify my Python code to retrieve and print out the whole block object (or the default exit expression and jumpkind)? Could you also recommend resources (like a link or manual) to learn the IR?

I appreciate your assistance.

rhelmot commented 9 months ago

https://docs.angr.io/en/latest/advanced-topics/ir.html

The whole block object is just block.vex.