JonathanSalwan / Triton

Triton is a dynamic binary analysis library. Build your own program analysis tools, automate your reverse engineering, perform software verification or just emulate code.
https://triton-library.github.io
Apache License 2.0
3.4k stars 524 forks source link

windows 16bit and 8 bit registers not simplified #1255

Open NaC-L opened 1 year ago

NaC-L commented 1 year ago

code

from triton import *

ctx = TritonContext(ARCH.X86_64)
block = BasicBlock([
    Instruction(b"\x66\xBE\x00\x00"),                    # mov si, 0
    Instruction(b"\x66\xBE\x01\x00"),                    # mov si, 1
])

print('[Original basic block] ----------------------------------------------- ')
ctx.disassembly(block, 0x140004149)
print(block)
print('[End of original basic block] ---------------------------------------- ')

print()

print('[Simplified basic block] --------------------------------------------- ')
sblock = ctx.simplify(block)
ctx.disassembly(sblock, 0x140004149)
print(sblock)
print('[End of simplified basic block] -------------------------------------- ')

output:

[Original basic block] ----------------------------------------------- 
0x140004149: mov si, 0
0x14000414d: mov si, 1
[End of original basic block] ---------------------------------------- 

[Simplified basic block] --------------------------------------------- 
0x140004149: mov si, 0
0x14000414d: mov si, 1
[End of simplified basic block] -------------------------------------- 

expected output:

[Original basic block] ----------------------------------------------- 
0x140004149: mov si, 0
0x14000414d: mov si, 1
[End of original basic block] ---------------------------------------- 

[Simplified basic block] --------------------------------------------- 
0x14000414d: mov si, 1
[End of simplified basic block] -------------------------------------- 
NaC-L commented 1 year ago

I traced the issue back to

https://github.com/JonathanSalwan/Triton/blob/e2faa65f07891d2635619864343d82ac289a7361/src/libtriton/engines/symbolic/symbolicEngine.cpp#L496

         auto worklist = triton::ast::childrenExtraction(expr->getAst(), true /* unroll */, false /* revert */);

        for (auto&& n : worklist) {
          if (n->getType() == triton::ast::REFERENCE_NODE) {
            auto expr  = reinterpret_cast<triton::ast::ReferenceNode*>(n.get())->getSymbolicExpression();
            auto eid   = expr->getId();

            exprs[eid] = expr;
          }
        }

Tried to comment those lines and see it helped, it was a big no-no. it would turn

add rax,1
add rax,1

to

add rax,1

instead of keeping it same

add rax,1
add rax,1

I tried to set unroll to false by triton::ast::childrenExtraction(expr->getAst(), false/* unroll */, false /* revert */);

that would only keep last two instructions

input:

add rax,1
add rax,1
add rax,1
add rax,1
mov si, 1
mov si, 2
mov si, 3

to

add rax,1
add rax,1
mov si, 2
mov si, 3

So it appears like the problem is with https://github.com/JonathanSalwan/Triton/blob/e2faa65f07891d2635619864343d82ac289a7361/src/libtriton/ast/ast.cpp#L3635

JonathanSalwan commented 1 year ago

Yeah the problem is that regs like si, ah, al, are sub registers that do not clear the upper bits of the register. So basically we can't easily determine if we should remove the last assignment. For example:

mov esi, 0x11220000
mov si, 0x3344

We have an optimization that fixes your issue : AST_OPTIMIZATIONS, however it kills other unit tests.