JonathanSalwan / Triton

Triton is a dynamic binary analysis library. Build your own program analysis tools, automate your reverse engineering, perform software verification or just emulate code.
https://triton-library.github.io
Apache License 2.0
3.39k stars 524 forks source link

any idea to simplify the whole basic block via llvm? #1338

Closed IamHuskar closed 2 days ago

IamHuskar commented 2 weeks ago

here is the code

struct op trace[] = {
  {0x2000005, "\x48\x89\x6C\x24\xF8",            5},           /* mov     [rsp-8], rbp*/
  {0x200000A, "\x48\x8D\x64\x24\xF8",            5},          /* lea     rsp, [rsp-8]      */
  {0x200000F, "\x48\x8D\x2D\x52\x2A\xE0\xFF",    7},    /* lea     rbp, [rip - 0x1fd5ae]   rbp=0x1e02a68 */
  {0x2000016, "\x48\x87\x2C\x24",                4},             /* xchg    rbp, [rsp]        */
  {0x200001A, "\x48\x8D\x64\x24\x08",            5},         /* lea     rsp, [rsp+8]      */
  {0x200001F, "\xFF\x64\x24\xF8",                4},             /* jmp     qword ptr [rsp-8] */
  {0x000000, nullptr,                           0}
};

the whole basic block is just a simple jump instruction ( jmp 0x1e02a68 ) Context::simplify(const triton::arch::BasicBlock& block)can't achieve the goal since it's only do deadStoreElimination。 shoud I symbolizeRegister rsp rbp and symbolizeMemory [rsp] [rsp-8] , then get the ast node of each SymbolicVariable。 call symiplfy at each ast node and merge all ast nodes ? any idea to simplify the whole basic block ?

IamHuskar commented 2 weeks ago

I am a beginner in using the symbol execution engine triton. @JonathanSalwan

IamHuskar commented 2 weeks ago

I find some examples

https://github.com/sh4m2hwz/VMPSimplifierUltra/blob/main/VMPSimplifierUltra.py#L59

https://github.com/sh4m2hwz/devirt_vmprotect3/blob/master/devirt.py#L94

I don't know if this is the correct method for optimizing basic blocks via triton

JonathanSalwan commented 2 days ago

First example using constant folding:

#!/usr/bin/env python
## -*- coding: utf-8 -*-
##
## $ python test.py
## 0x2000005: mov qword ptr [rsp - 8], rbp
## 0x200000a: lea rsp, [rsp - 8]
## 0x200000f: lea rbp, [rip - 0x1fd5ae]
## 0x2000016: xchg qword ptr [rsp], rbp
## 0x200001a: lea rsp, [rsp + 8]
## 0x200001f: jmp qword ptr [rsp - 8]
## (_ bv31468136 64)

from triton import *

ctx = TritonContext(ARCH.X86_64)
ctx.setMode(MODE.CONSTANT_FOLDING, True)
ctx.setMode(MODE.ALIGNED_MEMORY, True)

# Symbolize all registers
for r in ctx.getParentRegisters():
    ctx.symbolizeRegister(r, r.getName())

block = BasicBlock([
  Instruction(b"\x48\x89\x6C\x24\xF8"),
  Instruction(b"\x48\x8D\x64\x24\xF8"),
  Instruction(b"\x48\x8D\x2D\x52\x2A\xE0\xFF"),
  Instruction(b"\x48\x87\x2C\x24"),
  Instruction(b"\x48\x8D\x64\x24\x08"),
  Instruction(b"\xFF\x64\x24\xF8"),
])

# Process the block (constant folding will be applied)
ctx.processing(block, 0x2000005)
print(block)

# Get back the rip expression
rip = ctx.getRegisterAst(ctx.registers.rip)

# Print the rip expression (constant folding has been applied)
ast = ctx.getAstContext()
print(ast.unroll(rip))

Second example using llvm or solver simplification:

#!/usr/bin/env python
## -*- coding: utf-8 -*-
##
## 0x2000005: mov qword ptr [rsp - 8], rbp
## 0x200000a: lea rsp, [rsp - 8]
## 0x200000f: lea rbp, [rip - 0x1fd5ae]
## 0x2000016: xchg qword ptr [rsp], rbp
## 0x200001a: lea rsp, [rsp + 8]
## 0x200001f: jmp qword ptr [rsp - 8]
## (_ bv31468136 64)
## (_ bv31468136 64)

from triton import *

ctx = TritonContext(ARCH.X86_64)

# Symbolize all registers
for r in ctx.getParentRegisters():
    ctx.symbolizeRegister(r, r.getName())

block = BasicBlock([
  Instruction(b"\x48\x89\x6C\x24\xF8"),
  Instruction(b"\x48\x8D\x64\x24\xF8"),
  Instruction(b"\x48\x8D\x2D\x52\x2A\xE0\xFF"),
  Instruction(b"\x48\x87\x2C\x24"),
  Instruction(b"\x48\x8D\x64\x24\x08"),
  Instruction(b"\xFF\x64\x24\xF8"),
])

# Process the block (constant folding will be applied)
ctx.processing(block, 0x2000005)
print(block)

# Get back the rip expression
ast = ctx.getAstContext()
rip = ctx.getRegisterAst(ctx.registers.rip)

# Solution 1: Simplify the rip expression using llvm
print(ast.unroll(ctx.simplify(rip, llvm=True)))

# Solution 2: Simplify the rip expression using SMT solver (which is basically equal to our constant folding pass)
print(ast.unroll(ctx.simplify(rip, solver=True)))

I hope it can help.

IamHuskar commented 2 days ago

First example using constant folding:

#!/usr/bin/env python
## -*- coding: utf-8 -*-
##
## $ python test.py
## 0x2000005: mov qword ptr [rsp - 8], rbp
## 0x200000a: lea rsp, [rsp - 8]
## 0x200000f: lea rbp, [rip - 0x1fd5ae]
## 0x2000016: xchg qword ptr [rsp], rbp
## 0x200001a: lea rsp, [rsp + 8]
## 0x200001f: jmp qword ptr [rsp - 8]
## (_ bv31468136 64)

from triton import *

ctx = TritonContext(ARCH.X86_64)
ctx.setMode(MODE.CONSTANT_FOLDING, True)
ctx.setMode(MODE.ALIGNED_MEMORY, True)

# Symbolize all registers
for r in ctx.getParentRegisters():
    ctx.symbolizeRegister(r, r.getName())

block = BasicBlock([
  Instruction(b"\x48\x89\x6C\x24\xF8"),
  Instruction(b"\x48\x8D\x64\x24\xF8"),
  Instruction(b"\x48\x8D\x2D\x52\x2A\xE0\xFF"),
  Instruction(b"\x48\x87\x2C\x24"),
  Instruction(b"\x48\x8D\x64\x24\x08"),
  Instruction(b"\xFF\x64\x24\xF8"),
])

# Process the block (constant folding will be applied)
ctx.processing(block, 0x2000005)
print(block)

# Get back the rip expression
rip = ctx.getRegisterAst(ctx.registers.rip)

# Print the rip expression (constant folding has been applied)
ast = ctx.getAstContext()
print(ast.unroll(rip))

Second example using llvm or solver simplification:

#!/usr/bin/env python
## -*- coding: utf-8 -*-
##
## 0x2000005: mov qword ptr [rsp - 8], rbp
## 0x200000a: lea rsp, [rsp - 8]
## 0x200000f: lea rbp, [rip - 0x1fd5ae]
## 0x2000016: xchg qword ptr [rsp], rbp
## 0x200001a: lea rsp, [rsp + 8]
## 0x200001f: jmp qword ptr [rsp - 8]
## (_ bv31468136 64)
## (_ bv31468136 64)

from triton import *

ctx = TritonContext(ARCH.X86_64)

# Symbolize all registers
for r in ctx.getParentRegisters():
    ctx.symbolizeRegister(r, r.getName())

block = BasicBlock([
  Instruction(b"\x48\x89\x6C\x24\xF8"),
  Instruction(b"\x48\x8D\x64\x24\xF8"),
  Instruction(b"\x48\x8D\x2D\x52\x2A\xE0\xFF"),
  Instruction(b"\x48\x87\x2C\x24"),
  Instruction(b"\x48\x8D\x64\x24\x08"),
  Instruction(b"\xFF\x64\x24\xF8"),
])

# Process the block (constant folding will be applied)
ctx.processing(block, 0x2000005)
print(block)

# Get back the rip expression
ast = ctx.getAstContext()
rip = ctx.getRegisterAst(ctx.registers.rip)

# Solution 1: Simplify the rip expression using llvm
print(ast.unroll(ctx.simplify(rip, llvm=True)))

# Solution 2: Simplify the rip expression using SMT solver (which is basically equal to our constant folding pass)
print(ast.unroll(ctx.simplify(rip, solver=True)))

I hope it can help.

Thank you very much for your sample code.I solved this problem two days ago,but I forgot to close this issue. Is there a discord discussion group where we can exchange ideas on how to use the triton library? I found that using triton to remove code obfuscation is very effective. but I often encounter different types of obfuscation techniques, and there is no one to communicate with. so I spent a lot of time learning how to deal with these obfuscated codes. It would be nice to have a chat group where we can get together to discuss anything about triton .

JonathanSalwan commented 2 days ago

There is no discord dedicated to Triton. However I hang out on doar-e where there is a good technical community talking about everything related to binary exploitation and reverse engineering.