Required steps to integrate maybe-alias analysis into the symbolic execution engine

Heya guys!

I was discussing together with @fvrmatteo the following case: By default, Miasm's memory model assumes that symbolic memory addresses do not aliase. As a result, the symbolic execution engine never takes the jump since @64[RAX] can never be 0x2.

MOV        QWORD PTR [RAX], 0x1
MOV        QWORD PTR [RBX], 0x2
MOV        RCX, QWORD PTR [RAX]
CMP        RCX, 0x2
JZ         loc_20

To handle cases like that, I wrote a maybe-alias analysis which tells me possible code locations that might aliase. Then, I can handle it manually by mapping those to the same values, like

sb = SymbolicExecutionEngine(lifter)
sb.symbols[ExprMem(ExprId("RAX", 64), 64)] = ExprInt("alias", 64)
sb.symbols[ExprMem(ExprId("RBX", 64), 64)] = ExprInt("alias", 64)

In other words, I can handle both cases individually: By default, they will never aliase. If I map them to the same value, we handle the aliasing case.

However, I would like to build sth such that the symbolic execution stops at a conditional branch at the end of the basic block. It might look similar to the following:

IRDst = @64[maybe_aliase(RAX, RBX)] == 0x2 ? ... : ...

I'm aware that Miasm doesn't support this right now. However, I am currently trying to figure out would be required/the best way to implement it on my own. I think I would have to introduce a maybe_aliase operator, perform some analysis beforehand and propagate it somehow during symbolic execution. Would you guys agree or do you even think there might be an easier way to realize that?

Best,

mrphrazer

Hi @mrphrazer & @fvrmatteo ! Glad to hear from you guys!

Ok, let me rephrase, to be sure I understand correctly what you proposed, and maybe add some precision.

Let's add a custom operator, say MayAlias. This operator will take n arguments, which are expressions of base address of may alisaing memory.
in this operator, let say that the order is a bit important. To be precise, we will distinguish the first argument from others: this is a detail which will be useful in the following. The first argument will give a hint on the current manipulated memory, the others are a bunch of aliases.

So if memory based on RAX aliases with memory based on RBX and RCX let say that we will initialize the engine a bit like you proposed:

sb = SymbolicExecutionEngine(lifter)
sb.symbols[ExprId("RAX", 64)] = ExprOp("may_alias", ExprId("RAX", 64), ExprId("RBX", 64), ExprId("RCX", 64))
sb.symbols[ExprId("RBX", 64)] = ExprOp("may_alias", ExprId("RBX", 64), ExprId("RAX", 64), ExprId("RCX", 64))
sb.symbols[ExprId("RCX", 64)] = ExprOp("may_alias", ExprId("RCX", 64), ExprId("RAX", 64), ExprId("RBX", 64))

Note the arguments order in the initialization.

let's take an example simplier than yours first

MOV        QWORD PTR [RAX], 0x1
MOV        RCX, QWORD PTR [RAX]
CMP        RCX, 0x2
JZ         loc_20

During the evaluation of the first line MOV QWORD PTR [RAX], 0x1, we will first evaluate RAX to evaluate the memory pointer. The actual engine will return ExprOp("MayAlias", RAX, RBX, ECX) (say MayAlias(RAX, RBX, RCX) to simplify). So it will evaluate the final expression:

ExprMem(ExprOp(RAX, RBX, RCX), 64) = 1

If we use the current memory assignment hook (eval_assignblk) in the Symbolic engine to customize the "assignment", we could do the nexts steps:

extract the first operand (RAX) and the remainings (RBX/RBX). This is why the arguments order may be important here.
first, the operator ExprOp inform us that the "original" memory is pointed on RAX, so we can do a "real" assignment like:
```
ExprMem(RAX, 64) = 1
```
This will in return link ExprMem(RAX, 64) to 1 in the "classic" symbolic engine.
Next, we can add effects on the others aliasing memories, @rbx @and rcx. So we may also:
ethier invalidate every memories based on rbx and rcx in the current state
or maybe modify them to someting like: ExprOp("maybe_erased", 1, Oldvalue(XXX))

Next, we will evaluate the mov. So the code will evaluate @64[RAX]. In the case, we will first evalueate RAX which will give us MayAlias(RAX, RBX, RCX). So ExprMem(MayAlias(RAX, RBX, RCX)). In this case, we may use the eval_exprmem hook in the symbolic engine to simplify to ExprMem(RAX) and then to use the classical enginel to recover the 1 previously set. So what is interesting here is that even if we support aliasing, we may resolve 'simple' cases.

So rcx is 1, so the CMP is not taken.

Ok, now let's apply this to your example:

MOV        QWORD PTR [RAX], 0x1
MOV        QWORD PTR [RBX], 0x2
MOV        RCX, QWORD PTR [RAX]
CMP        RCX, 0x2
JZ         loc_20

The first mov will have the same side effects as before:

ExprMem(RAX) = 1 in the symbolic engine
erase of all RBX/RCX based memory (or modified)

The next line, MOV QWORD PTR [RBX], 0x2 will do the same behavior:

ExprMem(RBX) = 2 in the symbolic engine
erase (or modification) of all RAX/RCX based memory (so the @RAX is modified)

now, when evaluating the line MOV RCX, QWORD PTR [RAX] we will have:

evaluation of the @64[RAX]
so first evaluation of RAX which will give MayAlias(RAX, RBX, RCX)
then evaluation of ExprMem(MayAlias(RAX, RBX, RCX)) which will give using the previous custom rule, ExprMem(RAX) which will result in erased value, or modified value Modified(1, 2), depending on what we decided previously
the CMP will use this as operator, and then this will be feed in the JZ.

Are those steps seems legit to you guys? Did I understand correctly your solution, or have I a drift from what you proposed? Are you ok with this, or have you remarks or counter example which blows up this?

It seems that this "poc" might be done using the current engine, correct?

cea-sec / miasm

Required steps to integrate maybe-alias analysis into the symbolic execution engine #1401