A new symbolic memory model based on the QF_ABV SMT logic

JonathanSalwan commented 1 year ago

Last edit: Oct 05 2022.

Description

The choices are a matter of tradeoff and in order to scale the dynamic symbolic execution on million instructions we made the choice (7 years ago) to only rely on the QF_BV logic. Which means that our symbolic expressions only contain bitvector operators and no array. Thus, our constraints sent to the SMT solver are easier to solve. However, in some cases it could be great to reason on symbolic pointers. So here we are and this contribution provides 3 new modes:

MEMORY_ARRAY: Enables the symbolic pointers reasoning (QF_ABV logic). When this mode is not enabled, which is the case by default, the QF_BV memory model is applied. So, this contribution will not impact your analysis tools if you do not use this mode.
SYMBOLIZE_LOAD: Keeps symbolic expressions on load indexes (concretize them otherwise).
SYMBOLIZE_STORE: Keeps symbolic expressions on store indexes (concretize them otherwise).

Important note

This mode looks to work but still experimental. I can already tell you that it will complexify a lot constraints and increase the RAM consumption a lot. I'm still working on this mode and I will try to add optimizations so that we can free memory expressions and thus keep a descent RAM consumption. However, even with those cons, this mode has the merit of existing :slightly_smiling_face:

Short example

In this example we store the constant 0xdead to a fixed memory address 0x1032. Then, we symbolize the rsi register which is used as an index by the mov rcx, [rsi] instruction after a xor computation. Then, we constraint rcx to be equal to 0xdead.

from triton import *

ctx = TritonContext(ARCH.X86_64)
ctx.setMode(MODE.MEMORY_ARRAY, True)     # Enable the array logic
ctx.setMode(MODE.SYMBOLIZE_LOAD, True)   # Symbolize load indexing
ctx.setMode(MODE.SYMBOLIZE_STORE, True)  # Symbolize store indexing

ctx.symbolizeRegister(ctx.registers.rsi, 's_rsi')

code = [
    b"\x48\xc7\xc0\x00\x10\x00\x00", # mov rax, 0x1000
    b"\x48\xc7\xc3\x32\x00\x00\x00", # mov rbx, 0x32
    b"\xc7\x04\x18\xad\xde\x00\x00", # mov dword ptr [rax + rbx], 0xdead
    b"\x48\x81\xf6\xef\xbe\x00\x00", # xor rsi, 0xbeef
    b"\x48\x8b\x0e", # mov rcx, [rsi]
    b"\x48\x81\xf9\xad\xde\x00\x00", # cmp rcx, 0xdead
]

for op in code:
    i = Instruction(op)
    ctx.processing(i)
    print(i)
    for se in i.getSymbolicExpressions():
        print(se)
    print()

zf = ctx.getRegisterAst(ctx.registers.zf)
m = ctx.getModel(zf == 1)
print(m)

output

$ python array.py
0x0: mov rax, 0x1000
(define-fun ref!1 () (_ BitVec 64) (_ bv4096 64)) ; MOV operation
(define-fun ref!2 () (_ BitVec 64) (_ bv7 64)) ; Program Counter

0x7: mov rbx, 0x32
(define-fun ref!3 () (_ BitVec 64) (_ bv50 64)) ; MOV operation
(define-fun ref!4 () (_ BitVec 64) (_ bv14 64)) ; Program Counter

0xe: mov dword ptr [rax + rbx], 0xdead
(declare-fun ref!5 () (Array (_ BitVec 64) (_ BitVec 8)))
(define-fun ref!6 () (Array (_ BitVec 64) (_ BitVec 8)) (store ref!5 (bvadd (bvadd (bvadd ref!1 (bvmul ref!3 (_ bv1 64))) (_ bv0 64)) (_ bv3 64)) ((_ extract 31 24) (_ bv57005 32)))) ; Byte reference - MOV operation
(define-fun ref!7 () (Array (_ BitVec 64) (_ BitVec 8)) (store ref!6 (bvadd (bvadd (bvadd ref!1 (bvmul ref!3 (_ bv1 64))) (_ bv0 64)) (_ bv2 64)) ((_ extract 23 16) (_ bv57005 32)))) ; Byte reference - MOV operation
(define-fun ref!8 () (Array (_ BitVec 64) (_ BitVec 8)) (store ref!7 (bvadd (bvadd (bvadd ref!1 (bvmul ref!3 (_ bv1 64))) (_ bv0 64)) (_ bv1 64)) ((_ extract 15 8) (_ bv57005 32)))) ; Byte reference - MOV operation
(define-fun ref!9 () (Array (_ BitVec 64) (_ BitVec 8)) (store ref!8 (bvadd (bvadd (bvadd ref!1 (bvmul ref!3 (_ bv1 64))) (_ bv0 64)) (_ bv0 64)) ((_ extract 7 0) (_ bv57005 32)))) ; Byte reference - MOV operation
(define-fun ref!10 () (_ BitVec 32) (_ bv57005 32)) ; Original memory access - MOV operation
(define-fun ref!11 () (_ BitVec 64) (_ bv21 64)) ; Program Counter

0x15: xor rsi, 0xbeef
(define-fun ref!12 () (_ BitVec 64) (bvxor ref!0 (_ bv48879 64))) ; XOR operation
(define-fun ref!13 () (_ BitVec 1) (_ bv0 1)) ; Clears carry flag
(define-fun ref!14 () (_ BitVec 1) (_ bv0 1)) ; Clears overflow flag
(define-fun ref!15 () (_ BitVec 1) (bvxor (bvxor (bvxor (bvxor (bvxor (bvxor (bvxor (bvxor (_ bv1 1) ((_ extract 0 0) ref!12)) ((_ extract 1 1) ref!12)) ((_ extract 2 2) ref!12)) ((_ extract 3 3) ref!12)) ((_ extract 4 4) ref!12)) ((_ extract 5 5) ref!12)) ((_ extract 6 6) ref!12)) ((_ extract 7 7) ref!12))) ; Parity flag
(define-fun ref!16 () (_ BitVec 1) ((_ extract 63 63) ref!12)) ; Sign flag
(define-fun ref!17 () (_ BitVec 1) (ite (= ref!12 (_ bv0 64)) (_ bv1 1) (_ bv0 1))) ; Zero flag
(define-fun ref!18 () (_ BitVec 64) (_ bv28 64)) ; Program Counter

0x1c: mov rcx, qword ptr [rsi]
(define-fun ref!19 () (_ BitVec 64) (concat (select ref!9 (bvadd (bvadd (bvadd ref!12 (bvmul (_ bv0 64) (_ bv1 64))) (_ bv0 64)) (_ bv7 64))) (select ref!9 (bvadd (bvadd (bvadd ref!12 (bvmul (_ bv0 64) (_ bv1 64))) (_ bv0 64)) (_ bv6 64))) (select ref!9 (bvadd (bvadd (bvadd ref!12 (bvmul (_ bv0 64) (_ bv1 64))) (_ bv0 64)) (_ bv5 64))) (select ref!9 (bvadd (bvadd (bvadd ref!12 (bvmul (_ bv0 64) (_ bv1 64))) (_ bv0 64)) (_ bv4 64))) (select ref!9 (bvadd (bvadd (bvadd ref!12 (bvmul (_ bv0 64) (_ bv1 64))) (_ bv0 64)) (_ bv3 64))) (select ref!9 (bvadd (bvadd (bvadd ref!12 (bvmul (_ bv0 64) (_ bv1 64))) (_ bv0 64)) (_ bv2 64))) (select ref!9 (bvadd (bvadd (bvadd ref!12 (bvmul (_ bv0 64) (_ bv1 64))) (_ bv0 64)) (_ bv1 64))) (select ref!9 (bvadd (bvadd (bvadd ref!12 (bvmul (_ bv0 64) (_ bv1 64))) (_ bv0 64)) (_ bv0 64))))) ; MOV operation
(define-fun ref!20 () (_ BitVec 64) (_ bv31 64)) ; Program Counter

0x1f: cmp rcx, 0xdead
(define-fun ref!21 () (_ BitVec 64) (bvsub ref!19 (_ bv57005 64))) ; CMP operation
(define-fun ref!22 () (_ BitVec 1) (ite (= (_ bv16 64) (bvand (_ bv16 64) (bvxor ref!21 (bvxor ref!19 (_ bv57005 64))))) (_ bv1 1) (_ bv0 1))) ; Adjust flag
(define-fun ref!23 () (_ BitVec 1) ((_ extract 63 63) (bvxor (bvxor ref!19 (bvxor (_ bv57005 64) ref!21)) (bvand (bvxor ref!19 ref!21) (bvxor ref!19 (_ bv57005 64)))))) ; Carry flag
(define-fun ref!24 () (_ BitVec 1) ((_ extract 63 63) (bvand (bvxor ref!19 (_ bv57005 64)) (bvxor ref!19 ref!21)))) ; Overflow flag
(define-fun ref!25 () (_ BitVec 1) (bvxor (bvxor (bvxor (bvxor (bvxor (bvxor (bvxor (bvxor (_ bv1 1) ((_ extract 0 0) ref!21)) ((_ extract 1 1) ref!21)) ((_ extract 2 2) ref!21)) ((_ extract 3 3) ref!21)) ((_ extract 4 4) ref!21)) ((_ extract 5 5) ref!21)) ((_ extract 6 6) ref!21)) ((_ extract 7 7) ref!21))) ; Parity flag
(define-fun ref!26 () (_ BitVec 1) ((_ extract 63 63) ref!21)) ; Sign flag
(define-fun ref!27 () (_ BitVec 1) (ite (= ref!21 (_ bv0 64)) (_ bv1 1) (_ bv0 1))) ; Zero flag
(define-fun ref!28 () (_ BitVec 64) (_ bv38 64)) ; Program Counter

{0: s_rsi:64 = 0xaedd}

JonathanSalwan commented 1 year ago

Still in progress but it smells good. We are now able to solve our first crackme using the ABV logic which is, I remind, a new memory model. The only difference between the old solve.py and the new one is that we just enable the MEMORY_ARRAY mode. I've also added other modes like CONSTANT_FOLDING and AST_OPTIMIZATIONS to stress the new memory model and everything looks ok.

--- src/examples/python/ctf-writeups/defcamp-2015-r100/solve.py 2022-09-27 19:04:26.043125028 +0200
+++ src/examples/python/ctf-writeups/defcamp-2015-r100/solve-with-abv-logic.py  2022-09-27 19:03:01.033130502 +0200
@@ -122,8 +88,11 @@
     ctx = TritonContext(ARCH.X86_64)

     # Define symbolic optimizations
-    ctx.setMode(MODE.ALIGNED_MEMORY, True)
-    ctx.setMode(MODE.ONLY_ON_SYMBOLIZED, True)
+    ctx.setMode(MODE.CONSTANT_FOLDING, True)
+    ctx.setMode(MODE.AST_OPTIMIZATIONS, True)
+    ctx.setMode(MODE.MEMORY_ARRAY, True)
+    ctx.setMode(MODE.SYMBOLIZE_LOAD, True)
+    ctx.setMode(MODE.SYMBOLIZE_STORE, True)

     # Load the binary
     loadBinary(os.path.join(os.path.dirname(__file__), 'r100.bin'))

JonathanSalwan commented 1 year ago

Done, it works. We now we have to think about an optimization to deal with the memory state to avoid RAM consumption.

JonathanSalwan / Triton