JonathanSalwan / Triton

Triton is a dynamic binary analysis library. Build your own program analysis tools, automate your reverse engineering, perform software verification or just emulate code.
https://triton-library.github.io
Apache License 2.0
3.4k stars 525 forks source link

BasicBlock simplification bug #1174

Closed Hykni closed 1 year ago

Hykni commented 1 year ago
    ctx.setArchitecture(ARCH_X86_64);

    BasicBlock block = BasicBlock({ 
        Instruction("\x48\x8B\x14\x2C", 4),
        Instruction("\x48\x81\xEE\x08\x00\x00\x00", 8),
        Instruction("\x48\x89\x16", 3),
        Instruction("\x41\x51", 2),
        Instruction("\x8b\x03", 2),
    });

    ctx.disassembly(block);

    std::cout << block << std::endl;
    std::cout << "----------" << std::endl;

    block = ctx.simplify(block);
    ctx.disassembly(block);

    std::cout << block << std::endl;
    std::cout << "----------" << std::endl;

Produces output

0x0: mov rdx, qword ptr [rsp + rbp]
0x4: sub rsi, 8
0xb: mov qword ptr [rsi], rdx
0xe: push r9
0x10: mov eax, dword ptr [rbx]
----------
0x0: mov rdx, qword ptr [rsp + rbp]
0x4: sub rsi, 8
0xb: push r9
0xd: mov eax, dword ptr [rbx]
----------

The simplification engine is incorrectly removing mov qword ptr [rsi], rdx here.

Hykni commented 1 year ago

It thinks the push r9 overwrites the [rsi] store because it concretizes them to the same address. Seems to me in order to properly fix this you'd have to add symbolic addresses for the memory references map -- not so straightforward? For now I just made sure to keep all instructions that r/w memory.

JonathanSalwan commented 1 year ago

Yes, I think you are right, this is because they probably both write at the address 0.

Seems to me in order to properly fix this you'd have to add symbolic addresses for the memory references map -- not so straightforward?

Yes it's not so easy, we already have operators for that (select and store). But I think that if we use symbolic array we lost a part of this optimization as symbolic array have to keep all refs about select and store accesses.

JonathanSalwan commented 1 year ago

For now I just made sure to keep all instructions that r/w memory.

I've push a patch that allow you to do that directly with the API. If keepmem is true, we do not simplify memory accesses (true by default).

triton::arch::BasicBlock simplify(const triton::arch::BasicBlock& block, bool padding=false, bool keepmem=true) const;

I will probably push another patch that will use the current concrete state when simplify memory access so that we can avoid collision between memory address (like in this case where rsi and rsp that hold the same address).

JonathanSalwan commented 1 year ago

With the previous patch, the simplification uses the concrete state already defined. In your case, you have to define a concrete state for rsp and rsi, otherwise they will share the same initial value: 0. For example, if rsp and rsi share the same value, the store access is simplified. Then, if they do not share the same address, all instructions are kept.

JonathanSalwan commented 1 year ago

Feel free to close this thread if those both patches (09c8906c64455c6f4d82471a1dbec9efca5d96b5, 8ded15d0995de8c7a1ff2186051a76dfed002388) solve your issue.

Hykni commented 1 year ago

Thanks for the patches. I think there's still some issues with this solution, e.g.

mov rsi, 0
mov qword ptr [rsi], rdx
pop rsi

would be reduced to

mov qword ptr [rsi], rdx
pop rsi

because it doesn't consider the dependency on the memoryaccess expressions.

This is the badly hacked code I threw together for this that seems to work for me:

for (auto& insn : in.getInstructions()) {
    std::set<std::pair<triton::arch::MemoryAccess, triton::ast::SharedAbstractNode>> access;
    if (insn.isMemoryWrite()) {
        access = insn.getStoreAccess();
    }
    if (insn.isMemoryRead()) {
        access.insert(insn.getLoadAccess().begin(), insn.getLoadAccess().end());
    }
    for (auto& x : access) {
        auto worklist = triton::ast::childrenExtraction(x.second, true, false);
        for (auto&& n : worklist) {
            if (n->getType() == triton::ast::REFERENCE_NODE) {
                auto expr = reinterpret_cast<triton::ast::ReferenceNode*>(n.get())->getSymbolicExpression();
                auto eid = expr->getId();
                lifetime[eid] = expr;
            }
        }
        if (x.first.getLeaAst()) {
            worklist = triton::ast::childrenExtraction(x.first.getLeaAst(), true, false);
            for (auto&& n : worklist) {
                if (n->getType() == triton::ast::REFERENCE_NODE) {
                    auto expr = reinterpret_cast<triton::ast::ReferenceNode*>(n.get())->getSymbolicExpression();
                    auto eid = expr->getId();
                    lifetime[eid] = expr;
                }
            }
        }
    }
}
JonathanSalwan commented 1 year ago

Thx mate. Are we good now? I've also removed the keepmem option as we now keep instructions that build the effective addresses of load and store accesses.

Hykni commented 1 year ago

Looks ok to me