erigontech / silkworm

C++ implementation of the Ethereum protocol
Apache License 2.0
277 stars 63 forks source link

Preprocessing of live transaction to optimistically calculate read and write sets #193

Open AlexeyAkhunov opened 3 years ago

AlexeyAkhunov commented 3 years ago

This is continuation of https://github.com/torquem-ch/silkworm/issues/192

Modify the EVM engine (this might need a custom version of evmone) to change the value stack and semantics of some opcodes.

On the stack, apart from usual 256-bit values, we can also store "unknown" value, which can be just a bit flag

Change semantics of all opcodes that read or write anything to the state. For example for SLOAD, which reads from contract storage, if the location parameter is "unknown" and not a concrete value, the whole execution aborts However, if locations is a concrete value, these opcodes don't actually read anything, but push "unknown" on the stack instead of the read value. Similarly, SSTORE, which writes to contract storage, aborts the execution is location is "unknown", but if location is concrete value, it does not do anything (no-op)

So forth for BALANCE, EXTCODEHASH, and all other opcodes that need access to the state.

In essence, this version of EVM does not access the state. But it is able to look at a transaction and potentially compute "read set" and "write set", or fail. The idea is to first look at historical transactions and see how many of them could have been pre-processed this way. If many, then the next step is to try to take advantage of such pre-processing - if you know "read sets" and "write sets", you can try to run some of them in parallel.

If we find something interesting there, we might create some block composition strategies (for mining) that create better parallelizable blocks

AlexeyAkhunov commented 3 years ago

Update: Marco found another complication. A lot of contracts compiled by Solidity, use assert and require that lead to REVERT or execution abort depending on some condition involving state. If we simply fail execution for such code, it is likely that coverage will be very low. Therefore, we would like to see if we can (for REVERT and for aborts due to INVALID instruction and other cases, but not for JUMPI) track both branches for the purposes of "readSet"/"writeSet" analysis