Closed m4drat closed 5 months ago
There is no synchronization between the symbolic memory state and the concrete when MEMORY_ARRAY
is enabled. As soon as MEMORY_ARRAY
is enabled, the memory state starts from zero. So, every previous setConcreteMemoryAreaValue
are ignored. That's why it works if you set the concrete memory after enabling MEMORY_ARRAY
.
For the RAM consumption, it is known issue when using MEMORY_ARRAY
. Can you try to only symbolize your data instead of the whole code?
I was looking at the examples, shouldn't the call to setConcreteMemoryAreaValue
do what I need (set the contents for a loaded segment without making it symbolic)?
maybe something like this?
from triton import *
# code / data you don't want to reason symbolically on.
code = [
b"\x0c\x00\x9f\xe5", # 0x801A48: ldr r0, [pc, #0xc]
b"\x00\x00\x9a\xe7", # 0x801A4c: ldr r0, [sl, r0]
b"\x00\x00\xa0\xe1", # 0x801A50: mov r0, r0
]
# code / data you really want to reason symbolically on.
data = [
b"\x41\x41\x41\x41", # 0x801A54: just data
b"\x42\x42\x42\x42", # 0x801A58: just data
b"\x43\x43\x43\x43", # 0x801A5C: just data
]
ctx = TritonContext(ARCH.ARM32)
# Code is concrete
ctx.setConcreteMemoryAreaValue(0x801A48, b"".join(code))
# Data is symbolic
ctx.setMode(MODE.MEMORY_ARRAY, True)
ctx.setConcreteMemoryAreaValue(0x801A54, b"".join(data))
# Set concrete register values
ctx.setConcreteRegisterValue(ctx.registers.pc, 0x801A48)
for op in code:
inst = Instruction(op)
ctx.processing(inst)
print(inst)
print('r0 = ', hex(ctx.getConcreteRegisterValue(ctx.registers.r0)))
for e in inst.getSymbolicExpressions():
print(e)
print()
Hm, but what if I don't want to have symbolic representation for any of the data I'm defining on this step? E.g., while I'm loading the whole ELF I simply want to load all the sections (.text, .data, ...), after that I want to be able to enable the ctx.setMode(MODE.MEMORY_ARRAY, True)
so I could enable symbolic memory modelling for the user-provided input.
Right now what would happen is enabling ctx.setMode(MODE.MEMORY_ARRAY, True)
after setConcreteMemoryAreaValue
makes the instruction ldr r0, [pc, #0xc]
to read zero-value, while (as far as I can tell) it should be 0x43434343
.
Here is the full example. Judging by what ctx.getConcreteMemoryAreaValue(CODE_BASE, 24)
returns, all the data is here, but executing the instruction still returns 0.
from triton import TritonContext, ARCH, Instruction, EXCEPTION, MODE, AST_NODE
def main():
CODE_BASE = 0x801A48
code = [
b"\x0c\x00\x9f\xe5", # 0x801A48: ldr r0, [pc, #0xc]
b"\x00\x00\x9a\xe7", # 0x801A4c: ldr r0, [sl, r0]
b"\x00\x00\xa0\xe1", # 0x801A50: mov r0, r0
b"\x41\x41\x41\x41", # 0x801A54: just data
b"\x42\x42\x42\x42", # 0x801A58: just data
b"\x43\x43\x43\x43", # 0x801A5C: just data
]
ctx = TritonContext(ARCH.ARM32)
# Define the code segment
ctx.setConcreteMemoryAreaValue(CODE_BASE, b"".join(code))
ctx.setMode(MODE.MEMORY_ARRAY, True)
ctx.concretizeAllMemory()
ctx.concretizeAllRegister()
print(ctx.getConcreteMemoryAreaValue(CODE_BASE, 24))
# Setup the registers
ctx.setConcreteRegisterValue(ctx.registers.pc, CODE_BASE)
ctx.setThumb(False)
pc = CODE_BASE
while pc <= 0x801A50:
op = ctx.getConcreteMemoryAreaValue(pc, 4)
ins = Instruction(pc, op)
if not ins.getDisassembly():
ctx.disassembly(ins)
opcode = ins.getOpcode()
mnemonic = ins.getType()
ret = ctx.buildSemantics(ins)
if ret == EXCEPTION.FAULT_UD:
print(f"[-] Instruction is not supported: {ins}")
break
print(f"[+] {ins}")
print(f"[+] r0 = {ctx.getConcreteRegisterValue(ctx.registers.r0):#010x}")
print(f"[+] pc = {ctx.getConcreteRegisterValue(ctx.registers.pc):#010x}")
print("")
pc = ctx.getConcreteRegisterValue(ctx.registers.pc)
print("[+] Instruction emulation done.")
if __name__ == "__main__":
main()
Yes I understand your problem but the point is that the symbolic memory model of Triton has two different paradigms that can't communicate between each other. The SMT logic behind is either BV
or ABV
. When MEMORY_ARRAY
is off, BV
logic is used and it relies on the concrete state. But as soon as ABV
is used, there is no more concrete state for memory, ABV has its own memory model (which is symbolic).
When setting ABV
(e.g ctx.setMode(MODE.MEMORY_ARRAY, True)
), a fresh symbolic memory is created and then each store are saved as symbolic expression into the memory model of ABV
. Example below.
memory_0 = {} // fresh memory.
memory_1 = store(memory_0, 0x1000, 0x41) // store 0x41 at addr 0x1000 and returns the new memory state
memory_2 = store(memory_1, 0x1001, 0x42) // store 0x42 at addr 0x1001 and returns the new memory state
memory_3 = store(memory_2, 0x1002, 0x43) // store 0x43 at addr 0x1002 and returns the new memory state
memory_4 = store(memory_3, 0x1003, 0x44) // store 0x44 at addr 0x1003 and returns the new memory state
...
So basically what happens (see [0, 1, 2, 3]):
from triton import TritonContext, ARCH, Instruction, EXCEPTION, MODE, AST_NODE
def main():
CODE_BASE = 0x801A48
code = [
b"\x0c\x00\x9f\xe5", # 0x801A48: ldr r0, [pc, #0xc]
b"\x00\x00\x9a\xe7", # 0x801A4c: ldr r0, [sl, r0]
b"\x00\x00\xa0\xe1", # 0x801A50: mov r0, r0
b"\x41\x41\x41\x41", # 0x801A54: just data
b"\x42\x42\x42\x42", # 0x801A58: just data
b"\x43\x43\x43\x43", # 0x801A5C: just data
]
ctx = TritonContext(ARCH.ARM32)
# Define the code segment
ctx.setConcreteMemoryAreaValue(CODE_BASE, b"".join(code))
# [0] Before this point you were in BV logic, so the concrete state are stored as expected.
####################################################
ctx.setMode(MODE.MEMORY_ARRAY, True)
# [1] Now the memory model has changed, we switched to ABV, so we create a fresh memory.
# [2] It returns the correct answer because you requested the concrete state but not the symbolic one.
# From here, you have a desynchronization between your symbolic and your concrete memory state.
print(ctx.getConcreteMemoryAreaValue(CODE_BASE, 24))
[...]
# [3] From this point, every interpreted `load / store` uses the ABV memory model which is fresh (so 0).
while pc <= 0x801A50:
op = ctx.getConcreteMemoryAreaValue(pc, 4)
[...]
In ABV, to keep the synchro between the concrete and the symbolic state you should enable MEMORY_ARRAY
before any setConcrete*
.
Oh, makes sense now! Thanks a lot for the clarification!
Regarding the RAM consumption. This is because in ABV we have to keep track of every store access. All of them return a new symbolic memory state (with a SSA form). This is how store
nodes in SMT work but in our case, to be able to provide such state, we have to keep everything in memory which is quite explosive. We still didn't find an optimization for this case. That's why I don't recommend to use MEMORY_ARRAY on a whole part of the memory :)
memory_0 = {} // fresh memory.
memory_1 = store(memory_0, 0x1000, 0x41) // store 0x41 at addr 0x1000 and returns the new memory state
memory_2 = store(memory_1, 0x1001, 0x42) // store 0x42 at addr 0x1001 and returns the new memory state
memory_3 = store(memory_2, 0x1002, 0x43) // store 0x43 at addr 0x1002 and returns the new memory state
memory_4 = store(memory_3, 0x1003, 0x44) // store 0x44 at addr 0x1003 and returns the new memory state
Hi! I was experimenting with memory array mode and encountered some behaviour that I couldn't find explanation for.
So, I was trying to load some ELF sections with
MEMORY_ARRAY
mode enabled, however, I couldn't do it. First, I tried enabling the memory array mode, and then setting the segment data usingsetConcreteMemoryAreaValue
, however it resulted in excessive RAM usage and was eventually killed.Then I've tried to first set the concrete memory, and only after enable
MEMORY_ARRAY
mode. Which worked, the executable was no longer killed. But the executed code worked incorrectly.After executing the first instruction, it should've loaded the
0x43434343
(running this example withoutMEMORY_ARRAY
works correctly) constant, but instead ther0
contained 0.So my question is. Am I the one who's wrong trying to make it work the second way? :) Is there a way to concretize memory before enabling the
MEMORY_ARRAY
(so as to avoid excessive RAM consumption)?