In this project we develop the Dafny-EVM, a functional specification of the Ethereum Virtual Machine in Dafny.
This type of specification has several advantages:
Developing this specification in Dafny allows us to apply formal reasoning to Smart Contracts at the EVM Bytecode level. For example, one can prove that certain key properties are maintained by the contract. We choose Dafny over other verification systems (e.g. Coq or Isabelle/HOL) because it is relatively accessible to someone without significant prior experience.
Our functional specification is executable, meaning that we can run bytecode using it and compare their output with existing clients (e.g. Geth). In particular, we are interested in comparing against the Ethereum Common Reference Tests and have made some progress towards this.
Dafny supports automated software verification by leveraging the power of state-of-the-art automated theorem provers (e.g. with SMT solvers like Z3). This means Dafny can prove a program is correct with respect to its specification. To do this, Dafny requires the developer to provide annotations in the form of preconditions and postconditions where appropriate, and/or loop invariants as necessary.
Our semantics is written as a state transformer of type State -> State
.
As a simple example, consider the following specification given for
the semantics of the ADD
bytecode:
/**
* Unsigned integer addition with modulo arithmetic.
* @param st A state.
* @returns The state after executing an `ADD` or an `Error` state.
*/
function method Add(st: State): (st': State)
requires st.IsExecuting()
ensures st'.OK? || st' == INVALID(STACK_UNDERFLOW)
ensures st'.OK? <==> st.Operands() >= 2
ensures st'.OK? ==> st'.Operands() == st.Operands() - 1
{
if st.Operands() >= 2
then
var lhs := st.Peek(0) as int;
var rhs := st.Peek(1) as int;
var res := (lhs + rhs) % TWO_256;
st.Pop().Pop().Push(res as u256).Next()
else
State.INVALID(STACK_UNDERFLOW)
}
This tells us that ADD
requires two operands on the stack to be performed,
otherwise, the exceptional state INVALID(STACK_UNDERFLOW)
state is reached.
When more than two operands are on the stack,
addition employs modulo arithmetic (hence, overflows wrap around)
and the final result (of the addition modulo) is pushed onto the stack after the operands
are popped, and then the program counter is advanced by 1.
The postcondition ensures st'.OK? <==> st.Operands() >= 2
specifies a strong guarantee on the code in the body of
function: for any input state st
, Add
returns an OK
state (non-failure) if and only if
the stack in the input state st
has at least two elements (Operands()
).
Note that this postcondition is checked by the Dafny verification engine at compile-time not at runtime.
To use our code base you may follow these steps:
Our EVM is written in Dafny. As a result we can instrument bytecode with some reasoning features. Some examples are given in the verification examples document.
Dafny has to be translated to a target language to be run and to access functionality not included natively in Dafny. We have currently 2 target languages: Go and Java.
The Java target uses the gradle
build system. To build the code, you need the following components:
Java 11 (or greater)
Dafny 3.10 (or greater).
Gradle 7 (or greater)
With these installed, you can build the code using the following command:
> gradle build
This will verify the codebase using Dafny along with some examples,
generate a Java implementation of the EVM
, and run two test suites
against it in Java.
As the main purpose of our EVM is to reason about bytecode, we may want to have some guarantees that the proofs we develop are also valid on other EVM implementations: if the same code is run on another implementation then the guarantees (e.g. no stack under/overflow) that we obtain using our automated reasoning and our EVM are still valid. This requires to prove that the other implementation produces exactly the same computations as our EVM on all inputs and for all programs. It is not practical to formally prove this kind of equivalence.
However we can compare the results of the execution of some bytecode on different implementations.
If for a large number of tests two implementations give the same results (sequences of states), we have some confidence
that the two implementations are equivalent.
If our EVM yields the same results as, say the Geth's evm
tools, then we can be confident that our proofs on the bytecode should be valid on the Geth EVM too.
The test cases used for the Dafny EVM are stored in the tests/
directory. These are generated from the Ethereum Consensus
Tests using Geth's evm
tool.
Each test is a json
file similar in structure to that used by the
Ethereum Consensus Tests, except that they include full trace data
(i.e. the state of the EVM after every execution step).
To regenerate the trace tests, you need to ensure the fixtures
submodule is updated appropriately. If you originally employed git clone --recursive
when cloning the repository, then you don't need to
do anything. Otherwise, you can do this:
git submodule update --init
Using gradle
one can now regenerate all the trace tests as follows:
> gradle testgen
This can take several minutes to complete, and requires that Geth's
evm
tool is installed and visible on PATH
(we currently recommend
version 1.10.25
or later). Furthermore, the test generation process
is governed by the files tests/includes.txt
and
tests/excludes.txt
. The former determines which of the reference
tests should be included, whilst the latter identifies specific cases
to exclude. Finally, the trace generation process is managed by the
EvmTools framework.
See the CONTRIBUTORS file for more information on contributing to this repository. By default contributors accept the terms of the license. We also endeavour to follow the conventions of the Dafny style guide.
The architecture of our Dafny-EVM: ARCHITECTURE.md
Some useful links: