cea-sec / miasm

Reverse engineering framework in Python
https://miasm.re/
GNU General Public License v2.0
3.44k stars 470 forks source link

Required steps to integrate maybe-alias analysis into the symbolic execution engine #1401

Open mrphrazer opened 2 years ago

mrphrazer commented 2 years ago

Heya guys!

I was discussing together with @fvrmatteo the following case: By default, Miasm's memory model assumes that symbolic memory addresses do not aliase. As a result, the symbolic execution engine never takes the jump since @64[RAX] can never be 0x2.

MOV        QWORD PTR [RAX], 0x1
MOV        QWORD PTR [RBX], 0x2
MOV        RCX, QWORD PTR [RAX]
CMP        RCX, 0x2
JZ         loc_20

To handle cases like that, I wrote a maybe-alias analysis which tells me possible code locations that might aliase. Then, I can handle it manually by mapping those to the same values, like

sb = SymbolicExecutionEngine(lifter)
sb.symbols[ExprMem(ExprId("RAX", 64), 64)] = ExprInt("alias", 64)
sb.symbols[ExprMem(ExprId("RBX", 64), 64)] = ExprInt("alias", 64)

In other words, I can handle both cases individually: By default, they will never aliase. If I map them to the same value, we handle the aliasing case.

However, I would like to build sth such that the symbolic execution stops at a conditional branch at the end of the basic block. It might look similar to the following:

IRDst = @64[maybe_aliase(RAX, RBX)] == 0x2 ? ... : ...

I'm aware that Miasm doesn't support this right now. However, I am currently trying to figure out would be required/the best way to implement it on my own. I think I would have to introduce a maybe_aliase operator, perform some analysis beforehand and propagate it somehow during symbolic execution. Would you guys agree or do you even think there might be an easier way to realize that?

Best,

mrphrazer

serpilliere commented 2 years ago

Hi @mrphrazer & @fvrmatteo ! Glad to hear from you guys!

Ok, let me rephrase, to be sure I understand correctly what you proposed, and maybe add some precision.

During the evaluation of the first line MOV QWORD PTR [RAX], 0x1, we will first evaluate RAX to evaluate the memory pointer. The actual engine will return ExprOp("MayAlias", RAX, RBX, ECX) (say MayAlias(RAX, RBX, RCX) to simplify). So it will evaluate the final expression:

ExprMem(ExprOp(RAX, RBX, RCX), 64) = 1

If we use the current memory assignment hook (eval_assignblk) in the Symbolic engine to customize the "assignment", we could do the nexts steps:

Next, we will evaluate the mov. So the code will evaluate @64[RAX]. In the case, we will first evalueate RAX which will give us MayAlias(RAX, RBX, RCX). So ExprMem(MayAlias(RAX, RBX, RCX)). In this case, we may use the eval_exprmem hook in the symbolic engine to simplify to ExprMem(RAX) and then to use the classical enginel to recover the 1 previously set. So what is interesting here is that even if we support aliasing, we may resolve 'simple' cases.

So rcx is 1, so the CMP is not taken.

Ok, now let's apply this to your example:

MOV        QWORD PTR [RAX], 0x1
MOV        QWORD PTR [RBX], 0x2
MOV        RCX, QWORD PTR [RAX]
CMP        RCX, 0x2
JZ         loc_20

The first mov will have the same side effects as before:

The next line, MOV QWORD PTR [RBX], 0x2 will do the same behavior:

now, when evaluating the line MOV RCX, QWORD PTR [RAX] we will have:

Are those steps seems legit to you guys? Did I understand correctly your solution, or have I a drift from what you proposed? Are you ok with this, or have you remarks or counter example which blows up this?

It seems that this "poc" might be done using the current engine, correct?