JonathanSalwan / Triton

Triton is a dynamic binary analysis library. Build your own program analysis tools, automate your reverse engineering, perform software verification or just emulate code.
https://triton-library.github.io
Apache License 2.0
3.4k stars 524 forks source link

backward slicing from memory address #1235

Closed antonislouca closed 1 year ago

antonislouca commented 1 year ago

Hello,

i am facing the following problem, and I wanted to know if Triton has such functionality, to use for a backward slicing analysis. I am providing a small sample of disassembled code below. Assume: var i32 x @ rsp+0x20 var i32 y @ rsp+0x24

1: 0x000084f0 4881eca80000. sub rsp, 0xa8
2: 0x000084f7 c74424200500. mov dword [x], 5
3: 0x000084ff c74424240600. mov dword [y], 6
4: 0x00008507 837c242005 cmp dword [x], 5
5: 0x0000850c 7d16 jge 0x8524 6: 0x0000850e 488d7c2424 lea rdi, [y]
7: 0x00008513 e868ffffff call sym core::fmt::ArgumentV1::new_display::h58e293e31f8f9b08 8: 0x00008518 4889442410 mov qword [var_10h], rax 9: 0x0000851d 4889542418 mov qword [var_18h], rdx 10: 0x00008522 eb5a jmp 0x857e

Instead of slicing an expression from a register. i want to create a symbolic expression for the y variable (rsp+0x24) so I can slice backwards and locate the instructions that affect the load of variable y in rdi. Manually that would be lines 5-2. Can triton be used to create such an expression, and solve such problem?

TinyNiko commented 1 year ago

Maybe getMemoryAst can work. eg: mem = ctx.getMemoryAst(MemoryAccess(y_addr, 4))

antonislouca commented 1 year ago

I think my main problem is that I don't have proper propagation of the expressions. I followed the example given for backward slicing, but it does not seem to create the correct expression, that when solved will produce the correct slice. Is there any way to modify how expressions propagate? And add expressions? I have made that memory access that I want (rsp+0x20) into a symbolic memory but I don't know how to make Triton create an expression out of it.

JonathanSalwan commented 1 year ago
2: 0x000084f7 c74424200500. mov dword [x], 5
3: 0x000084ff c74424240600. mov dword [y], 6
4: 0x00008507 837c242005 cmp dword [x], 5
5: 0x0000850c 7d16 jge 0x8524

When executing the line 5, can you just get the flag registers (sf, of), or maybe just rip and slice from them ?

antonislouca commented 1 year ago

Slicing the rip register from the command in line 5 yields the following result. [slicing] 0x8776: cmp dword ptr [rsp], 5 [slicing] 0x8776: cmp dword ptr [rsp], 5 [slicing] 0x8776: cmp dword ptr [rsp], 5 [slicing] 0x877a: jge 0x87c1

Where [rsp] is the location of x. Shouldn't this slice return the commands in lines 1 and 2 also? 1: sub rsp, 0xa8 2: mov dword [x], 5 Am I missing something in this logic? Should I combine the slicing of all these registers?

JonathanSalwan commented 1 year ago

Mmmh, it should also slice the line 2 but not the line 1 as we do not keep symbolic memory indexing.

from triton import *

ctx = TritonContext(ARCH.X86_64)

ctx.setConcreteRegisterValue(ctx.registers.rsp, 0x7fffffff)
ctx.setConcreteRegisterValue(ctx.registers.rdi, 0x1fffffff)

code = [
    b"\x48\x81\xec\xa8\x00\x00\x00", # sub rsp, 0xa8
    b"\xc7\x04\x24\x05\x00\x00\x00", # mov dword ptr [rsp], 5
    b"\xc7\x07\x06\x00\x00\x00", # mov dword ptr [rdi], 6
    b"\x83\x3c\x24\x05", # cmp dword ptr [rsp], 5
    b"\x0f\x8d\x1e\x85\x00\x00", # jge 0x8524
]

for op in code:
    i = Instruction(op)
    ctx.processing(i)
    for e in i.getSymbolicExpressions():
        e.setComment(i.getDisassembly())

rip = ctx.getSymbolicRegister(ctx.registers.rip)
for expr in sorted(ctx.sliceExpressions(rip).items()):
    print(expr)
$ python ./test.py
(8, (define-fun ref!8 () (_ BitVec 8) ((_ extract 31 24) (_ bv5 32))) ; mov dword ptr [rsp], 5)
(9, (define-fun ref!9 () (_ BitVec 8) ((_ extract 23 16) (_ bv5 32))) ; mov dword ptr [rsp], 5)
(10, (define-fun ref!10 () (_ BitVec 8) ((_ extract 15 8) (_ bv5 32))) ; mov dword ptr [rsp], 5)
(11, (define-fun ref!11 () (_ BitVec 8) ((_ extract 7 0) (_ bv5 32))) ; mov dword ptr [rsp], 5)
(20, (define-fun ref!20 () (_ BitVec 32) (bvsub (concat ref!8 ref!9 ref!10 ref!11) (_ bv5 32))) ; cmp dword ptr [rsp], 5)
(23, (define-fun ref!23 () (_ BitVec 1) ((_ extract 31 31) (bvand (bvxor (concat ref!8 ref!9 ref!10 ref!11) (_ bv5 32)) (bvxor (concat ref!8 ref!9 ref!10 ref!11) ref!20)))) ; cmp dword ptr [rsp], 5)
(25, (define-fun ref!25 () (_ BitVec 1) ((_ extract 31 31) ref!20)) ; cmp dword ptr [rsp], 5)
(28, (define-fun ref!28 () (_ BitVec 64) (ite (= ref!25 ref!23) (_ bv34108 64) (_ bv30 64))) ; jge 0x853c)
antonislouca commented 1 year ago

Thank you. I seem to get the correct results now after I removed the line: ctx.setMode(MODE.ALIGNED_MEMORY, True). Can you also please explain why you concretized the rdi value?

JonathanSalwan commented 1 year ago

If you are talking about:

ctx.setConcreteRegisterValue(ctx.registers.rsp, 0x7fffffff)
ctx.setConcreteRegisterValue(ctx.registers.rdi, 0x1fffffff)

It's just to provide an initial context. As you have two memory accesses [rsp] and [rdi], it's just to be sure that they do not point on the same memory address.

antonislouca commented 1 year ago

Ok thank you for your help, I may need some more assistance with this but I will open a different thread.