crytic / ethersplay

EVM dissassembler
GNU Affero General Public License v3.0
834 stars 116 forks source link

Refactor #27

Closed joshwatson closed 6 years ago

joshwatson commented 6 years ago

Refactored Ethersplay to use Binary Ninja's IL for control flow recovery, as well as function identification and invalid jump detection. This also fixes several other issues, such as being able to open multiple files at once, and being able to save and reopen bndb files.

There are still a few hiccups, notably that the dataflow is limited to 8 bytes right now, though some of it might work for 32 bytes if we bump it up. Additionally, there are a few cases where control flow recovery is still incomplete.

montyly commented 6 years ago

Great work!

Before merging to master, I have some concerns about the use of the binja dataflow to recover the CFG.

On simple example like:

contract C{

    function f0() internal{

    }

    function f1() internal{
        f0();
    }

    function f2() internal{
        f0();
        f1();
    }

    function (){
        f2();
    }
}

Binja is not able to recover the proper CFG image

While the RewriteStackValue gives: image

My understanding is that the dataflow from binja does not handle well the merge of values for BB with different stack size, which is a common pattern in EVM (mostly due to the fact that the BB are heavily shared). Is this something that can be handled in binja, or is it an internal limitation?

Moreover, if I understand correctly the way the validities of JUMP destinations are checked, an analysis is performed once the CFG is recovered and the invalid edges are removed. This works well with contracts compiled with solc, but it may generate incorrect CFG with contracts from another compiler or written in assembly. The problem is that, once an incorrect edge is added (JUMP -> non-JUMPDEST), the value from the basic block source is propagated to the basic block destination. As a result of this propagation, incorrect edges may be added and so the CFG could be wrong. Is there a way to avoid it? Or did I misunderstood the check on JUMP destinations?

These two problems may lead to either incomplete or incorrect results. If we can fix them, we can remove the stack value analysis from ethersplay. If we can't fix them, it may be better to keep the stack value from RewriteStackValue and not use the binja dataflow to recover the CFG.

The dataflow is still very powerful for the creation of other analyses.

joshwatson commented 6 years ago

It is still incomplete, but as we were discussing at the meeting yesterday, it’s working a lot better than master currently is, which is why I was asked to merge it.

Once edges are removed, the function is reanalyzed, which should deal with incorrect value propagation. As for missing branches, new branches can be added via stack analysis. We can’t remove branch lifting without messing up all of the lifting dataflow. It would be better to use them together rather than one over the other.

joshwatson commented 6 years ago

Once this ticket for improving dataflow for stack based languages like EVM is dealt with, the stack analysis won't be necessary any longer, I think. The image below demonstrates that dataflow can handle multiple jump targets when dataflow is actually working (as it does for x86):

image

Until then, I propose that this be merged to master and then merge the new stack variable analysis.