PyVEX is Python bindings for the VEX IR.
Project repository: https://github.com/angr/pyvex
Documentation: https://api.angr.io/projects/pyvex/en/latest/
PyVEX can be pip-installed:
pip install pyvex
import pyvex
import archinfo
# translate an AMD64 basic block (of nops) at 0x400400 into VEX
irsb = pyvex.lift(b"\x90\x90\x90\x90\x90", 0x400400, archinfo.ArchAMD64())
# pretty-print the basic block
irsb.pp()
# this is the IR Expression of the jump target of the unconditional exit at the end of the basic block
print(irsb.next)
# this is the type of the unconditional exit (i.e., a call, ret, syscall, etc)
print(irsb.jumpkind)
# you can also pretty-print it
irsb.next.pp()
# iterate through each statement and print all the statements
for stmt in irsb.statements:
stmt.pp()
# pretty-print the IR expression representing the data, and the *type* of that IR expression written by every store statement
import pyvex
for stmt in irsb.statements:
if isinstance(stmt, pyvex.IRStmt.Store):
print("Data:", end="")
stmt.data.pp()
print("")
print("Type:", end="")
print(stmt.data.result_type)
print("")
# pretty-print the condition and jump target of every conditional exit from the basic block
for stmt in irsb.statements:
if isinstance(stmt, pyvex.IRStmt.Exit):
print("Condition:", end="")
stmt.guard.pp()
print("")
print("Target:", end="")
stmt.dst.pp()
print("")
# these are the types of every temp in the IRSB
print(irsb.tyenv.types)
# here is one way to get the type of temp 0
print(irsb.tyenv.types[0])
Keep in mind that this is a syntactic respresentation of a basic block. That is, it'll tell you what the block means, but you don't have any context to say, for example, what actual data is written by a store instruction.
To deal with widely diverse architectures, it is useful to carry out analyses on an intermediate representation. An IR abstracts away several architecture differences when dealing with different architectures, allowing a single analysis to be run on all of them:
rax
is stored starting at address 16 in this memory space).There are lots of choices for an IR. We use VEX, since the uplifting of binary code into VEX is quite well supported. VEX is an architecture-agnostic, side-effects-free representation of a number of target machine languages. It abstracts machine code into a representation designed to make program analysis easier. This representation has five main classes of objects:
t0
. These temporaries are strongly typed (i.e., "64-bit integer" or "32-bit float").VEX IR is actually quite well documented in the libvex_ir.h
file (https://github.com/angr/vex/blob/dev/pub/libvex_ir.h) in the VEX repository. For the lazy, we'll detail some parts of VEX that you'll likely interact with fairly frequently. To begin with, here are some IR Expressions:
IR Expression | Evaluated Value | VEX Output Example |
---|---|---|
Constant | A constant value. | 0x4:I32 |
Read Temp | The value stored in a VEX temporary variable. | RdTmp(t10) |
Get Register | The value stored in a register. | GET:I32(16) |
Load Memory | The value stored at a memory address, with the address specified by another IR Expression. | LDle:I32 / LDbe:I64 |
Operation | A result of a specified IR Operation, applied to specified IR Expression arguments. | Add32 |
If-Then-Else | If a given IR Expression evaluates to 0, return one IR Expression. Otherwise, return another. | ITE |
Helper Function | VEX uses C helper functions for certain operations, such as computing the conditional flags registers of certain architectures. These functions return IR Expressions. | function_name() |
These expressions are then, in turn, used in IR Statements. Here are some common ones:
IR Statement | Meaning | VEX Output Example |
---|---|---|
Write Temp | Set a VEX temporary variable to the value of the given IR Expression. | WrTmp(t1) = (IR Expression) |
Put Register | Update a register with the value of the given IR Expression. | PUT(16) = (IR Expression) |
Store Memory | Update a location in memory, given as an IR Expression, with a value, also given as an IR Expression. | STle(0x1000) = (IR Expression) |
Exit | A conditional exit from a basic block, with the jump target specified by an IR Expression. The condition is specified by an IR Expression. | if (condition) goto (Boring) 0x4000A00:I32 |
An example of an IR translation, on ARM, is produced below. In the example, the subtraction operation is translated into a single IR block comprising 5 IR Statements, each of which contains at least one IR Expression (although, in real life, an IR block would typically consist of more than one instruction). Register names are translated into numerical indices given to the GET Expression and PUT Statement.
The astute reader will observe that the actual subtraction is modeled by the first 4 IR Statements of the block, and the incrementing of the program counter to point to the next instruction (which, in this case, is located at 0x59FC8
) is modeled by the last statement.
The following ARM instruction:
subs R2, R2, #8
Becomes this VEX IR:
t0 = GET:I32(16)
t1 = 0x8:I32
t3 = Sub32(t0,t1)
PUT(16) = t3
PUT(68) = 0x59FC8:I32
Cool stuff!
If you use PyVEX in an academic work, please cite the paper for which it was developed:
@article{shoshitaishvili2015firmalice,
title={Firmalice - Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware},
author={Shoshitaishvili, Yan and Wang, Ruoyu and Hauser, Christophe and Kruegel, Christopher and Vigna, Giovanni},
booktitle={NDSS},
year={2015}
}