cea-sec / miasm

Reverse engineering framework in Python
https://miasm.re/
GNU General Public License v2.0
3.47k stars 474 forks source link

Issue with overlapping memory in the symbolic execution #439

Closed Lukas-Dresel closed 6 years ago

Lukas-Dresel commented 8 years ago

In a sample I am currently working on I was messing around with different IR optimizations and came across some very weird behavior as a result of that that revolve around the overlapping memory calculations in the symbexec.py script.

As an example, below is one of the basic blocks that resulted after i tried something similar to the bbl_simplifier pass but on IR blocks.

When running this in the symbolic execution it raises assertions because it detects a 224 bit overlapping memory access somewhere that is assigned a 32 bit value, which obviously results in an exception.

I'm pretty sure it occurs in this assignBlock:

    @32[(ESP+0xFFFFFFF0)] = EDX
    @32[(ESP+0xFFFFFFE4)] = EBX
    @32[(ESP+0xFFFFFFE8)] = EAX
    @32[(ESP+0xFFFFFFFC)] = EBP
    @32[(ESP+0xFFFFFFE0)] = 0x81B8D2
    @32[(ESP+0xFFFFFFF4)] = ESI
    @32[(ESP+0xFFFFFFF8)] = EDI
    EIP = 0x81B8D5
    ESP = (ESP+0xFFFFFFE0)
    @32[(ESP+0xFFFFFFEC)] = ECX

I think it detects an overlap between @32[(ESP+0xFFFFFFE0)] = 0x81B8D2 and @32[(ESP)]

Then in symbexec.py get_mem_overlapping it calculates a pointer diff which comes out to be -4 and is then in line 362 cast to a signed integer and compared against the argument size. This however is a greater-than comparison with a negative number.

Then in substract_mems it calculates negative ptr_diffs and only checks the upper bound to not be larger than the lower bound, but in the end we get a memory_size of 224 bits which cannot be right.

Any ideas how this could be fixed? Or is my IR invalid in some way? Thanks for the help.

Full original IR Block:

loc_000000000081B878:0x0081b878
    EIP = 0x81B896
    EDX = @32[(ESP+0x14)]
    ESI = @32[(ESP+0x4)]
    EBP = @32[(ESP+0x8)]
    ECX = @32[(ESP+0x18)]
    EAX = @32[(ESP+0x1C)]
    EBX = @32[(ESP+0x10)]
    ESP = (ESP+0x20)
    EDI = @32[ESP]

    EIP = 0x81B89C

    EIP = 0x81B8A6
    @32[(ESP+0xFFFFFFFC)] = EAX
    @32[(ESP+0xFFFFFFF4)] = 0x81B8A3
    @32[(ESP+0xFFFFFFF8)] = EBX
    ESP = (ESP+0xFFFFFFF4)

    af = 0x0
    EIP = 0x81B8AE
    pf = 0x1
    zf = 0x1
    of = 0x0
    EAX = {@16[ESP],0,16, EAX[16:32],16,32}
    cf = 0x0
    EBX = {@16[(ESP+0x4)],0,16, EBX[16:32],16,32}
    ESP = (ESP+0x4)
    nf = 0x0

    EIP = 0x81B8B4
    ESP = (ESP+0x4)
    EAX = @32[ESP]

    EIP = 0x81B8BB

    af = 0x0
    EIP = 0x81B8C1
    pf = 0x1
    zf = 0x1
    of = 0x0
    EAX = @32[ESP]
    cf = 0x0
    EBX = EAX
    ESP = (ESP+0x4)
    nf = 0x0

    EIP = 0x81B8C6

    @32[(ESP+0xFFFFFFF0)] = EDX
    @32[(ESP+0xFFFFFFE4)] = EBX
    @32[(ESP+0xFFFFFFE8)] = EAX
    @32[(ESP+0xFFFFFFFC)] = EBP
    @32[(ESP+0xFFFFFFE0)] = 0x81B8D2
    @32[(ESP+0xFFFFFFF4)] = ESI
    @32[(ESP+0xFFFFFFF8)] = EDI
    EIP = 0x81B8D5
    ESP = (ESP+0xFFFFFFE0)
    @32[(ESP+0xFFFFFFEC)] = ECX

    EIP = 0x81B8DB

    af = 0x0
    EIP = 0x81B8E4
    pf = 0x1
    zf = 0x1
    of = 0x0
    EAX = @32[(ESP+0x8)]
    cf = 0x0
    EBX = {@16[(ESP+0x4)],0,16, EBX[16:32],16,32}
    ESP = (ESP+0x8)
    nf = 0x0

    EIP = 0x81B8EA

    af = 0x0
    EIP = 0x81B8EF
    pf = 0x1
    zf = 0x1
    of = 0x0
    nf = 0x0
    cf = 0x0
    EBX = @32[ESP]
    ESP = (ESP+0x4)

    @32[(ESP+0xFFFFFFF0)] = 0x81B8F8
    @32[(ESP+0xFFFFFFF4)] = EAX
    EIP = 0x81B8FB
    @32[(ESP+0xFFFFFFFC)] = EAX
    EAX = EBX
    @32[(ESP+0xFFFFFFF8)] = EBX
    EBX = EAX
    ESP = (ESP+0xFFFFFFF0)

    af = 0x0
    EIP = 0x81B903
    pf = 0x1
    zf = 0x1
    of = 0x0
    EAX = {@16[ESP],0,16, EAX[16:32],16,32}
    cf = 0x0
    EBX = {@16[(ESP+0x4)],0,16, EBX[16:32],16,32}
    ESP = (ESP+0x4)
    nf = 0x0

    EIP = 0x81B909
    ESP = (ESP+0x4)
    EAX = @32[ESP]

    EIP = 0x81B910

    af = 0x0
    EIP = 0x81B916
    pf = 0x1
    zf = 0x1
    of = 0x0
    EAX = @32[ESP]
    cf = 0x0
    EBX = EAX
    ESP = (ESP+0x4)
    nf = 0x0

    EIP = 0x81B91C

    af = 0x0
    EIP = 0x81B922
    pf = 0x1
    ESI = EDI
    zf = 0x1
    of = 0x0
    nf = 0x0
    cf = 0x0
    EDI = ESI

    ESI = EDI
    EIP = 0x81B92A
    EDI = ESI

    EIP = 0x81B92F

    EIP = 0x81B935

    af = 0x0
    EIP = 0x81B93A
    pf = 0x1
    zf = 0x1
    of = 0x0
    EAX = EBX
    cf = 0x0
    EBX = EAX
    nf = 0x0

    EIP = 0x81B943
    EBX = EAX
    EAX = EBX

    EIP = 0x81B949

    EIP = 0x81B958
    ESI = 0x200000
    @32[(ESP+0xFFFFFFFC)] = EAX
    @32[(ESP+0xFFFFFFF4)] = 0x81B955
    @32[(ESP+0xFFFFFFF8)] = EBX
    ESP = (ESP+0xFFFFFFF4)

    EIP = 0x81B95E

    af = 0x0
    EIP = 0x81B967
    pf = 0x1
    zf = 0x1
    of = 0x0
    EAX = @32[(ESP+0x8)]
    cf = 0x0
    EBX = {@16[(ESP+0x4)],0,16, EBX[16:32],16,32}
    ESP = (ESP+0x8)
    nf = 0x0

    EIP = 0x81B96D

    af = 0x0
    EIP = 0x81B972
    pf = 0x1
    zf = 0x1
    of = 0x0
    nf = 0x0
    cf = 0x0
    EBX = @32[ESP]
    ESP = (ESP+0x4)

    EIP = (zf?(0x81B982,0x81B975))
    EBX = EAX
    EAX = EBX
    IRDst = (zf?(0x81B982,0x81B975))
serpilliere commented 8 years ago

Hi,

I am afraid I didn't fully understand the question. Some questions though:

Lukas-Dresel commented 8 years ago

Hey thanks for the quick response!

The original block without the optimization worked without a problem so it must be something that happened after the merging of the basic blocks. (Or at least symbolically executing the unmerged blocks independently from each other worked)

About the minimal example, I'm not quite sure, I can try minimizing things away until it no longer happens. I'm using 2 simplifications, one that emulates a block and replaces it with the symbol-assignments the symbolic execution returns and one that merges the basic blocs.

The problem occurs by first merging the basic blocks and then when the merged basic block is supposed to be replaced with the symbolic execution result.

Lukas-Dresel commented 8 years ago

The merger, heavily borrowed from bbl_simplifier:

_parent = MatchGraphJoker(restrict_in=False)
_son = MatchGraphJoker(restrict_out=False)
_expgraph = _parent >> _son

def _merge_ir_blocks(dg, graph):
    """Graph simplification merging ir_bloc with one and only one son with this
    son if this son has one and only one parent"""

    # Blocks to ignore, because they have been removed from the graph
    to_ignore = set()

    for match in _expgraph.match(graph):

        # Get matching blocks
        start_label, succ_label = match[_parent], match[_son]
        start_bloc = graph._blocks[start_label]
        succ_bloc = graph._blocks[succ_label]

        # Ignore already deleted blocks
        if (start_bloc in to_ignore or
            succ_bloc in to_ignore):
            continue

        irdst = ExprId('IRDst')
        for assign_block in start_bloc.irs:
            if irdst in assign_block:
                del assign_block[irdst]

        start_bloc.irs += succ_bloc.irs
        start_bloc.lines += succ_bloc.lines
        start_bloc.invalidate_dst()

        succ_bloc.irs = [AssignBlock([ExprAff(irdst, ExprInt(0, size=32))])]
        succ_bloc.lines = []
        succ_bloc.invalidate_dst()

        # Merge block
        for nextb in graph.successors_iter(succ_label):
            graph.add_edge(start_label, nextb)

        graph.del_node(succ_label)
        to_ignore.add(succ_label)

ir_bbl_simplifier = DiGraphSimplifier()
ir_bbl_simplifier.enable_passes([_merge_ir_blocks])
Lukas-Dresel commented 8 years ago

The symbexec simplifier: (also borrowed mostly from somewhere else, don't remember)

symbols_init = {}
    for r in ir_arch.arch.regs.all_regs_ids:
        x = ExprId(r.name, r.size)
        x.is_term = True
        symbols_init[r] = x
    sb = symbexec(ir_arch, dict(symbols_init))
    sb.emulbloc(ir_bloc)

    eqs = []
    irdst = ExprId('IRDst')
    eip = ExprId('EIP')
    for n_w in sb.symbols:
        v = sb.symbols[n_w]
        if n_w in symbols_init and symbols_init[n_w] == v:
            continue
        if n_w == irdst or n_w == eip:
            # We want the IRDst to be at the end of the block for easy viewing
            continue
        eqs.append(ExprAff(n_w, v))

    # Finally, add the IRDst and EIP assignments
    eqs.append(ExprAff(eip, sb.symbols[eip]))
    eqs.append(ExprAff(irdst, sb.symbols[irdst]))

    ir_bloc.irs = [AssignBlock(eqs)]
    ir_bloc.lines = [None]
    ir_bloc.invalidate_dst()
Lukas-Dresel commented 8 years ago

the invalidate_dst() call is a modification to miasm I made that simply clears out the dst and linenb_dst fields so that they must be recomputed, now that the destination has changed.

The merge simplifier is very, very hacky because of the way the IR graph is handled, but that is a separate issue.

commial commented 8 years ago

I did not check on an example for now, just a few remark:

As @serpilliere said, if you have a specific example of two blocks which cause the fail, it could help.

Lukas-Dresel commented 8 years ago

That is very interesting, the same happens with all_regs_ids_init when doing it like this because you need a dictionary, but it produces a different overlapping memory error.

symbols_init = {}
for i, r in enumerate(ir_arch.arch.regs.all_regs_ids):
    symbols_init[r] = ir_arch.arch.regs.all_regs_ids[i]
Lukas-Dresel commented 8 years ago

The error can be reproduced with this script (edited for minimal size sample)

from miasm2.analysis.machine import Machine
from miasm2.core.asmbloc import asm_symbol_pool
from miasm2.expression.expression import ExprAff, ExprMem, ExprOp, ExprCond, ExprCompose, ExprSlice
from miasm2.expression.expression import ExprId
from miasm2.expression.expression import ExprInt
from miasm2.ir.ir import irbloc, AssignBlock
from miasm2.ir.symbexec import symbexec, symbols

irs = []

irs.append(AssignBlock([
ExprAff(ExprId('ESP', size=32), ExprOp('+', ExprId('ESP', size=32), ExprInt(0x20, 32))),
]))
irs.append(AssignBlock([
ExprAff(ExprMem(ExprOp('+', ExprId('ESP', size=32), ExprInt(0xfffffffc, 32)), size=32), ExprId('EBP', size=32)),
ExprAff(ExprMem(ExprOp('+', ExprId('ESP', size=32), ExprInt(0xffffffe0, 32)), size=32), ExprInt(0x81b8d2, 32)),
]))

symbol_pool = asm_symbol_pool()
irb = irbloc(symbol_pool.getby_name_create('label01'), irs)

machine = Machine("x86_32")

"""
def test_make_symbols_init():
    all_regs_ids_init = [ExprId("%s" % x.name, x.size) for x in machine.mn.regs.all_regs_ids]

    regs_init = {}
    for i, r in enumerate(machine.mn.regs.all_regs_ids):
        all_regs_ids_init[i].is_term = True
        regs_init[r] = all_regs_ids_init[i]
    return regs_init
"""

ir_arch = machine.ira(symbols())

symbols_init = {}
for i, r in enumerate(ir_arch.arch.regs.all_regs_ids):
    symbols_init[r] = ir_arch.arch.regs.all_regs_ids[i]

#symbols_init = ir_arch.arch.regs.regs_init
#symbols_init = test_make_symbols_init()

symb = symbexec(ir_arch, symbols_init)
symb.emulbloc(irb)

# Modified elements
print 'Modified registers:'
symb.dump_id()
print 'Modified memory:'
symb.dump_mem()
Lukas-Dresel commented 8 years ago

The memory symbol dump shows the following incorrect memory access:

Modified memory:
@224[(ESP+0x24)] EBP[0:32]
@32[ESP] 0x81B8D2
Lukas-Dresel commented 8 years ago

When using ir_arch.arch.reg.regs_init we get

Modified memory:
@32[(ESP_init+0x1C)] EBP_init
@32[ESP_init] 0x81B8D2

However, the ESP_inits that appear in that case seem invalid because they are different symbols from the normal ones, e.g. the affectations assign to ESP and the blocks afterward take ESP_init again which is incorrect. It basically then ignores all previous assignblocks.

commial commented 8 years ago

You may try to run the symbolic execution using ir_arch.arch.reg.regs_init and at the end, replace init elements with their initial name (using .replace_expr on the inverse dict of ir_arch.arch.reg.regs_init). What happens here is that initializing a symbolic execution engine using the same name for a key and its value could lead into weird and unexpected behavior. This is basically what you got when you init it the first way (imagine the following situation: EAX = EAX at the init. If I have EBX = EAX in the code, it will say "hey, I know a value for EAX, and this is EAX", and then again and again).

serpilliere commented 8 years ago

Here are some explanations, including the how and why it's built this way, to finish with an open question which may lead to a modification in Miasm's symbolic execution core engine.

Here is a recap. Each AssignBlock can be seen as a transfer function (in the mathematics meaning): It links output variables to input variables. So an AssignBlock can be represented as: (a', b', c', ...) = f(a, b, c, ...)

When you emulate an irs (so multiple AssignBlock) it will do something like:

(a', b', c', ...) = f(g(h(a, b, c, ...)))

with f, g, h tree AssignBlocks. For example, let say: f(x) = x + 1 g(x) = x + 2 h(x) = x + 3

In math, f(g(h(x))) = (((x+3)+2)+1) (so x+6)

The symbolic execution in Miasm works this way: a dictionary links "variables" to their corresponding values. If we have for example EAX=1, an entry in this dictionary is the EAX key, and it's corresponding value is 1. When we want to evaluate a variable, we are looking for it in this dictionary. If it's present, it becomes the associated value. If not, it remains the same. This is the main difference with math. The fact are that in math, if you want to apply f(x)=x+1 to y, you will evaluate f(y) = (y) + 1 so y+1. In math, you are done and you don't ask yourself if y can be evaluated to something else. More explanations on this in the next comment

To do some symbolic computations, we will assign to the register EAX the value EAX_init. So back to the previous f, g, h. Let say we have the AssignBlocks:

f:
EAX = EAX+1

g:
EAX = EAX+2

h:
EAX = EAX+3

Those AssignBlocks are the definitions of the equations. Our starting dictionary will be {EAX:EAX_init}. Now, let apply the function h to our state. The equation is EAX = EAX+3. We have to evaluate EAX (and 3, but which is obviously 3). This gives EAX_init. Note that if you try to evaluate EAX_init, it will fail: it's not a key of the dictionary so it will answer EAX_init. The resulting evaluation of EAX+3 on our current state {EAX:EAX_init} gives EAX = EAX_init+3.

As the block is over, we will update our current state (dictionary) and it becomes {EAX:EAX_init+3}.

Now, if we apply the g function, it will do the same thing, except that EAX will be resolved using the EAX_init+3. So in the equation EAX = EAX+2 it will evaluate the EAX+2 as (EAX_init+3) + 2. Which will be resolved as EAX_init+5.

The last function will give the final state {EAX:EAX_init+6}. Note that this is a state and not an equation. So if you want to get back the equivalent equation, which links EAX to EAX, you have to get back the first information which was that EAX_init was the initial value of EAX, so this state should give the equation EAX = EAX + 6. See?

By the way, what is happening if you do not use the _init trick? First attempt: you are starting with the dictionary {EAX:EAX}. It's the case @commial described. When applying the equation EAX = EAX+1 it will try to resolve EAX, which gives EAX. It will then wonder if EAX can be resolved and the dictionary will respond yes, it resolves as EAX and so on... This makes an endless loop. Nearly the same behavior is observed if we start using an empty dictionary {}. The first lookup of EAX won't resolve, so EAX is used. After the addition is done, the state becomes {EAX:EAX+1}. And here starts the loop at the second lookup.

This is why we used some times ago the attribute 'is_term'. In the original state, when we wrote {EAX:EAX}, the first EAX was not an is_term, but its corresponding value had its is_term=True. Some times (years?) later, we do not use the is_term anymore because of some implementation details. It's not fully removed from Miasm yet, but it's on the way. It can be seen as regression but it's not (from our point of view). In fact, when you use the is_term 'trick', the state is in reality a hidden {EAX(!is_term):EAX(is_term)}. So we decided to explicitly force it using a different symbol.

serpilliere commented 8 years ago

The fact that the algorithm keep on looking at in the dictionary to resolve variables is the main difference from the math point of view (and from the lambda calculus). Why?

When you are doing some *symbolic executions, you may want to add some information in order to have an nice result. Let's take an example: Here is a function:

push ebp
mov  epb, esp
mov  eax, dword ptr [ebp+8]
...

This translates to something like:

AssignBlock 0:
ESP = ESP-4
[ESP-4] = EBP

AssignBlock 1:
EBP = ESP

AssignBlock 2:
EAX = [EBP+8]

If we are doing some symbolic execution on it (using the init) the resulting value of EAX will be [ESP_init+4], which is the first argument of the function (x86, 32 bit, stdcall). It's convenient to have the opportunity to add this information before the execution of this function to have the final equation expressed using this form. So the starting state will be (for example) {[ESP_init+4]: arg0, [ESP_init+8]: arg1}.

This information will be used during the evaluation of the last AssignBlock in the following order:

1. Evaluation of [EBP+8] => first, evaluate EBP and 8
2. Evaluation of EBP => gives ESP_init-4
3. Back to the evaluation of [EBP+8] which gives [ESP_init+4]
4. Evaluation of [ESP_init+4] which gives arg0.

So we obtain EAX = arg0. Quite convenient no?

Now, imagine we are in pure lambda calculus. How do you 'inject' this information ? One of the solutions which comes in mind is to let the calculus operate, which leads to EAX = [ESP+4], and parse the expression tree to match and replace the targets. Another one is to inject the information not in the state, but in the simplification engine.

Your questions and remarks have raised this problem and solutions, and we have started a discussion on modifying the core symbolic execution engine to operate in full lambda calculus.

The advantages: this may remove the use of any 'reg_init' or initial state (the "state" result will be 'directly' the transfer equation with the same variable space) . The code will be simpler.

The drawbacks: some code modifications will be needed on the "client side" in order to replace some feature offered by the actual mechanism, for example the evaluation of aliases which will be replaced by some code in the simplification engine (for example)

This feature will be discussed with @commial and some other Miasm's folks. We will keep you in touch.

Anyway, thank you for your remarks and questions: this may lead to an interesting improvement !

Lukas-Dresel commented 8 years ago

Hey, thank you very much for the detailed answer! I'm just getting into miasm so I'm still trying to wrap my head around a lot of stuff.

Can you elaborate on how you would think to inject the argument information directly in the simplification engine? I'm just thinking about the program i'm currently reverse engineering which does a bunch of bogus calls to then removing the return address and similar stuff and in this case having argument replacement hardcoded in the simplification engine would make things very confusing.

Replacing the patterns in the syntax tree seems to me to be the easier approach, as well as allowing more control to the user to turn it on and off or replace it with something completely different if he needs to. Couldn't the expression simplifiers already be used for this?

serpilliere commented 8 years ago

The simplification added may be to replace something like @32[ESP+X] by argY.

If you are working on an obfuscated binary I must give you a warn on the IRA you have used: There is a difference between the machine.ir() and machine.ira() For example, in the first version, the translation of a CALL will give the side effects of the instruction:

loc_0000000000000000:0x00000000
        EIP = loc_0000000011223344:0x11223344
        ESP = {(ESP[0:32]+0xFFFFFFFC),0,32}
        @32[(ESP[0:32]+0xFFFFFFFC)] = loc_0000000000000005:0x00000005
        IRDst = loc_0000000011223344:0x11223344

We clearly see the stacking of the return address, and the jump to the target function. The second one is used to "model" the sub function, it will give something like:

lbl_gen_00000000:None
        ESP = call_func_stack(loc_0000000011223344:0x11223344, ESP)
        EAX = call_func_ret(loc_0000000011223344:0x11223344, ESP)

        IRDst = loc_0000000000000005:0x00000005

In this case, the whole sub function at 0x11223344 has been modeled with only two side effects:

Note here, the modelisation is very poor and generic (using the strage and unknown operator call_func_ret. The main goal of this feature is to give you a chance to inject a correct model of the function. For example:

lbl_gen_00000000:None
        ESP = ESP + 0x4
        EAX = ExprOp("malloc_0", @32[ESP])

        IRDst = loc_0000000000000005:0x00000005

which may be used to do some code static analysis :smile: