EiNSTeiN- / decompiler

A decompiler with multiple backend support, written in Python. Works with IDA and Capstone.
526 stars 103 forks source link

decompiler Build Status

A multi-backends decompiler written in python. It currently supports IDA and Capstone.

Usage with Capstone

Install Capstone's Python bindings like followings:

$ sudo pip install capstone

Then try out the decompiler:

from capstone import *
from decompiler import *
from host import dis
from output import c

# Create a Capstone object, which will be used as disassembler
md = Cs(CS_ARCH_X86, CS_MODE_32)

# Define a bunch of bytes to disassemble
code = "\x55\x89\xe5\x83\xec\x28\xc7\x45\xf4\x00\x00\x00\x00\x8b\x45\xf4\x8b\x00\x83\xf8\x0e\x75\x0c\xc7\x04\x24\x30\x87\x04\x08\xe8\xd3\xfe\xff\xff\xb8\x00\x00\x00\x00\xc9\xc3"

# Create the capstone-specific backend; it will yield expressions that the decompiler is able to use.
disasm = dis.available_disassemblers['capstone'].create(md, code, 0x1000)

# Create the decompiler
dec = decompiler_t(disasm, 0x1000)

# Transform the function until it is decompiled
dec.step_until(step_decompiled)

# Tokenize and output the function as string
print(''.join([str(o) for o in c.tokenizer(dec.function).tokens]))

The snippet of code above should output:

func() {
   s0 = 0;
   if (*s0 == 14) {
      s2 = 134514480;
      3830();
   }
   return 0;
}

Much like Capstone itself, the capstone backend does not know what address is a string, and has no concept of named location. This is why 3830() and 134514480 appear as they do in the decompiled code above. You can give this information to the disassembler backend for a prettier output:

disasm.add_string(134514480, "string")
disasm.add_name(3830, "func_3830")
print(''.join([str(o) for o in c.tokenizer(dec.function).tokens]))

Now the decompiled output is:

func() {
   s0 = 0;
   if (*s0 == 14) {
      s2 = 'string';
      func_3830();
   }
   return 0;
}

Current status

It is currently capable of decompiling small functions with fairly simple control flow. It may also be able to decompile larger functions by pure luck. It shows what can be done in a few thousand lines of python.

Test binaries are provided in tests/.

How does it work?

This project is based on a paper by van Emmerik titled Static Single Assignment for Decompilation.

Roadmap

This project could use some improvements in the following areas: