Formal verification of custom Solidity assertions

leonardoalt commented 7 years ago

Hi there,

I've extended Oyente a little bit in my fork (https://github.com/leonardoalt/oyente) to try and verify any Solidity assertion. An assertion is compiled by Solc as a simple jump to INVALID in case the condition fails. So I use the SMT branch queries to decide whether a certain symbolic path may reach an INVALID (an assertion failure), and if that's the case ask z3 for the model. This has worked pretty well for the examples I've tried, but of course if you introduce unbounded loops it's gonna explode.

The basic part is identifying assertions in the ASM code, since Solc uses INVALID for other cases as well, for example if the called function doesn't exist (this is always contained in Block 0), or if CALLVALUE is zero. So far I've been considering as assertions blocks that start with INVALID except the cases above. Question 1: Is anyone aware of a better way to detect assertions in the basic blocks?

If INVALID is reachable, the model is retrieved from z3 and printed. I also tried some tricks to automatically figure out in which Solidity function the assertion that failed is, by checking which blocks represent the start of each function (by using the signatures and hashes of the functions), building a reverse CFG and walking backwards from the block that contains the assertion. Of course this only guarantees what was the top-level function that led to the assertion, but its correctness is not guaranteed in case of nested function calls. Question 2: Does anyone have an idea of how I can map this information back to the Solidity source code in a better way? Ideally I'd like to map back symbolic to Solidity variables as well, but I know it's even harder.

Notice that I am trying to do all that without changing the language or the compiler, which would make it easier but less usable.

Cheers, Leo

pirapira commented 7 years ago

To Question 1:

If the Solidity compiler uses INVALID for undesired user input, that's a bug in the compiler. Please file an issue.

Recently, a similar issue was fixed.

To Question 2:

solc/solc --asm-json foo.sol associates each instruction with a range in the srouce code.

 {
                                        "begin" : 39,
                                        "end" : 43,
                                        "name" : "PUSH",
                                        "value" : "0"
                                },

means the PUSH comes from the srouce code characters 39--43 (counted from zero). I'm not aware of any kind of variable->declration look up.

loiluu commented 7 years ago

@leonardoalt awesome idea. We thought about implementing that feature previously but haven't got chance to address it yet. Would love to see your PR soon. Regarding your questions, I hope you are happy with @pirapira's answer. Otherwise, please let me know.

leonardoalt commented 7 years ago

@pirapira Thanks for the answers!

I'll definitely try the solution for Question 2.

I'm a little confused by Question 1's answer though. If not INVALID, what is the correct bytecode for an assertion? Maybe I am also misunderstanding the meaning of an assertion in Solidity. For me an assertion represents a property that holds at that point concerning anything, where by "anything" I mean mostly correctness of a piece of code, not only undesired user input. I interpret that this view is supported by this short discussion: https://github.com/ethereum/solidity/issues/2118

@loiluu Cool! I guess I might have to change it anyway if INVALID is not the way to go with assertions. If you're interested, we can discuss some ideas to improve the tool on this direction.

pirapira commented 7 years ago

@leonardoalt A correct bytecode for a require is REVERT opcode, which will be introduced in the Metropolis hardfork. Even before the Metropolis hardfork, we can use the opcode for REVERT because that throws anyway.

Your interpretation is right. When an assertion fails, there is a bug in the Solidity code, the Solidity compiler, or the EVM implementation.

leonardoalt commented 7 years ago

@pirapira Oh I see. Cool, gonna use that one then. Thanks again!

loiluu commented 7 years ago

@leonardoalt regarding unbounded loop, we do allow you to specify the bound for the loop. Try the --looplimit option.

loiluu commented 7 years ago

I am happy to discuss more regarding this direction. Lets do it here so other people can join?

joeykrug commented 7 years ago

Would this work on bytecode too where assertions fail?

MicahZoltu commented 7 years ago

To be clear, assert(<condition>) will do INVALID while require(<condition>) will do REVERT. My understanding is that the intended use case is that asserts should never be hit and require may be hit during normal execution. For a tool like this, that means that what you want to look for is asserts, not requires.

loiluu commented 7 years ago

@joeykrug yes, Oyente basically works on bytecode.

leonardoalt commented 7 years ago

@MicahZoltu Exactly what I thought, thanks.

@joeykrug The idea is to verify the bytecode, but in case there is a bug, to try and map it back to the Solidity code (in case the source was written in Solidity).

loiluu commented 7 years ago

@leonardoalt please let us know how we can help you on this issue!

leonardoalt commented 7 years ago

@loiluu Thanks! I haven't been working on it since I opened the thread, but I'll implement the solution to Question 2 this week. Ideally this will take care of pointing to the function that contains the broken assertion. Any ideas on how to map values from the model back to Solidity variables? The first way that I thought would be to separate storage and local variables, could be a start.

joeykrug commented 7 years ago

Just wanted to comment and say this would be an amazing feature ^^

leonardoalt commented 7 years ago

Cool, I see the PR regarding this topic was merged :)

Continuing the discussion here: as I mentioned in the PR, there are cases that lead to false positives (false reports on assertion failures). One of those cases that I am aware of is the use of ENUM. It uses INVALID a lot, so you might see false positives if you run Oyente with the option -a on a code with ENUMs.

The combination of this new code and the recent REVERT implementation lead to a nice feature: one can use require as a formal assumption on the code, which will be used as a constraint to verify the assert.

Take the following code, for example:

contract T {
    function f(uint x) {
        require(x < 2000);
        assert(x < 3000);
    }
}

In this case, after the require, x < 2000 is true for the rest of the function, therefore it is also < 3000. If you change the require line to require(x < 4000) for example, the assertion is not true anymore, and Oyente will give you a model (even though in a not so nice format :p) which is a value for x that satisfies the require but breaks the assert.

leonardoalt commented 7 years ago

The hacks in my code to identify the use of INVALID that does not come from an assert might not be necessary anymore since those were probably bugs and have already been fixed in the newer versions of solc. I'll check that this week.

leonardoalt commented 7 years ago

I'm closing this issue since assertion checks are working well and assertion fails have been nicely mapped back to the Solidity assert code with the recent work on Source Map. I haven't seen any false positives, but if I do I'll create a new issue.

luongnt95 commented 7 years ago

Good work. Thanks @leonardoalt

enzymefinance / oyente

Formal verification of custom Solidity assertions #127