Improve testing of EVM correctness

Problem

Halmos is essentially a light EVM interpreter (without gas), and because we're in the business of looking for corner cases of EVM bytecode, it is actually mission critical for our modeling to be as correct and complete as possible. However, we're not doing a ton to ensure that we're actually conforming to the spec.

Solution

The most complete thing we could do is make sure that we have good coverage of the EVM state transition tests, which is something github.com/ethereum/execution-specs/ does (as well as KEVM from what I've heard)

However this is going to be difficult in practice because:

the state tests are big and hairy
we don't implement gas and are not likely to, so we would need to focus on the subset of official tests that don't require that (if any)

So on the spectrum from very-lightweight to fully-compliant testing, we are currently too close to very-lightweight and would like to move meaningfully towards fully-compliant, while remaining pragmatic.

Alternatives

Maybe we can get away with unit testing for specific corner cases at the instruction level, e.g.:

dividing by 0 returns 0
reading past calldatasize returns 0s
reading past codesize returns 0s
a failed ecrecover returns 0 bytes (via @0xPhaze tweet)
...

Extra context

This is actually a massive task, but marked good first issue because there is a well defined spec out there and it is possible to make incremental progress while keeping halmos basically a black box.

a16z / halmos