Colton1skees / Dna

LLVM based static binary analysis framework
GNU General Public License v3.0
185 stars 18 forks source link
analysis binary deobfuscation instruction-semantics lifter llvm llvm-ir program-analysis static-analysis triton x86 x86-64

Dna

Dna is a static binary analysis framework built on top of LLVM. Notably it's written almost entirely in C#, including managed bindings for LLVM, Remill, and Souper.

Functionality

Dna implements an iterative control flow graph reconstruction inspired heavily by the SATURN paper. It iteratively applies recursive descent, lifting (using remill), and path solving until the complete control flow graph is recovered. In the case of jump tables, we use a recursive algorithm based on Souper and z3 to solve the set of possible jump table targets. You can find the iterative exploration algorithm here, and the jump table solving algorithm here.

Once a control flow graph has been fully explored, it can then be recompiled to x86 and reinserted into the binary using the algorithms from here and here. Though the compiled code is not pretty by any means, it should run so long as the recovered control flow graph is correct. That being said, it is still a research prototype - bugs and edge cases are expected. Control flow graph exploration may fail in the case of e.g. unbounded jump tables or unliftable instructions.

Some other notable features:

Some caveats:

Dependencies

Note that Dna is currently based on LLVM 17.

Building

Dna will not build out of the box. Custom patches to remill and souper were needed for this to build on windows. If you would like to work on Dna, open an issue or email me colton1skees@gmail.com. At some point I may publish proper build steps, but I make no guarantees.