cea-sec / miasm

Reverse engineering framework in Python
https://miasm.re/
GNU General Public License v2.0
3.44k stars 470 forks source link

Simplification passes for ExprAssign #1004

Open mrphrazer opened 5 years ago

mrphrazer commented 5 years ago

Hi!

I'm preparing a PR. For this, I have to apply simplification rules for ExprAssign which has to perform different transformation for src and dst.

ira_cfg = ira.new_ircfg_from_asmcfg(asm_cfg)
ira_cfg.simplify(expr_simp_high_to_explicit)
expr_simp = ExpressionSimplifier ()
expr_simp.enable_passes({ExprAssign: [my_simp]})
ira_cfg.simplify(expr_simp)

Since simplify from AssignBlock applies the same operation to src and dst:

for dst, src in viewitems(self):
    if dst == src:
        continue
    new_src = simplifier(src)
    new_dst = simplifier(dst)

I patched it as follows:

for dst, src in viewitems(self):
    if my_simp_flag:
        e = self.dst2ExprAssign(dst)
        rewritten = simplifier(e)
        new_src = rewritten.src
        new_dst = rewritten.dst
    else:
        if dst == src:
            continue
        new_src = simplifier(src)
        new_dst = simplifier(dst)

Obviously, this is not how it should be done. What do you think would be a good way to apply this?

commial commented 5 years ago

Hello,

Hum, for now, I would rather explictely call the simplifier in your script, instead of modifying AssignBlock.simplify in Miasm. We have this kind of code in several place in Miasm, like here: https://github.com/cea-sec/miasm/blob/master/miasm/analysis/outofssa.py#L383

mrphrazer commented 5 years ago

Hi!

I could do this, but in this case I think I would also have to modify/recreate all Assign/IR blocks in ira_cfg manually, if I want to perform further analysis ?

I am currently looking for a clean solution since I'm planning to introduce a new PR that requires performing simplifications of ExprAssign on the graph level in a first step before applying SSA.

serpilliere commented 5 years ago

Ok. Just some remarks: For us, the ExprAssign is a weird word in Miasm, as it's more a statement than a right/left value. But as it belongs to the Expr class for now, maybe we can consider that expression simplifier can deal with it: The patch for this may be a little one: the case ExprAssign has to be added in the expression simplifier cases.

But it may trigger some new behavior:

@32[EAX] = @32[EAX] + 1

and let say we have concluded in a previous analysis that @32[EAX] can be replaced by 0x1337BEEF. Here we clearly want that the replace_expr on the ExprAssign gives:

@32[EAX] = 0x1337BEEF + 1

and not:

0x1337BEEF = 0x1337BEEF + 1

So the conclusion maybe that we may have to

For me I think we have to take this problem into account and maybe the second solution is the good for. Today, we are using replace_expr and try to twist it's behavior to match our goal but the real solution should be to have explicit and clear APIs for this. Also, It will make clearer what in Miasm is a right/left value, which seems a good point to me :smile:

What do you think about this?

mrphrazer commented 5 years ago

Hi!

I think both approaches have advantges and disadvantages. On the short term, introducing replace_right_values and replace_left_values seems for sure way more feasible (perhaps simplify_lhs and simplify_rhs are better wordings?). However, on the long term this is not the most ideal solution in terms of clean code and unnecessary computations.

Lets take for instance the following:

ira_cfg.simplify_lhs(expr_simp_lhs)
ira_cfg.simplify_rhs(expr_simp_rhs)

Lets assume simplify_lhs and simplify_rhs look as follows:

    def simplify_lhs(self, simplifier):
        """
        Return a new AssignBlock with expression simplified
        @simplifier: ExpressionSimplifier instance
        """
        new_assignblk = {}
        for dst, src in viewitems(self):
            new_dst = simplifier(dst)
            new_assignblk[new_dst] = src
        return AssignBlock(irs=new_assignblk, instr=self.instr)

In these cases, we iterate all IR instructions and generate all AssignBlocks twice. Assuming that the expession simplifier is able to handlle an ExprAssign (where we can define custom passes for the left and the right side), this would not be the case. However, way more code would have to be changed.

serpilliere commented 5 years ago

Hi @mrphrazer ,

In fact I was not talking about simplification rules, but about the replace_expr. I agree with you for the double creation of assignent blocks. But maybe we can have something like:

replace_expr(left_tokens_replacement, right_tokens_replacement)

In this function we could manage left and right simultaneously, which will involve only one creation of basic block.

But I am curious about a thing: have you got some reduction rules example which may by applied on the right side of an expression and which should not be applied to the right one ? (ok, let say the replacement is a case appart)

mrphrazer commented 5 years ago

Hi! Perhaps it is better to submit the PR first and discuss the details afterwards . Otherwise it might (or will) not make any sense to you.

Give me a few days, then I will take up the discussion here again.

mrphrazer commented 5 years ago

This was quicker than intended:

See PR #1021 for more details.

The simplification pass for ExprAssign is required to rewrite memory expressions as follows:

# ebx = @32[eax]
ebx = mem_read(M, eax, 32)

# @32[eax] = ebx
M = mem_write(M, eax, ebx, 32)

Do you have any suggestions how we could this implement in a clean manner?