erigontech / silkworm

C++ implementation of the Ethereum protocol
Apache License 2.0
277 stars 63 forks source link

Transaction analytics by transaction replay #78

Open AlexeyAkhunov opened 3 years ago

AlexeyAkhunov commented 3 years ago

This in an introductory project designed to find some very useful information and perhaps optimisation opportunities for Silkworm execution, at the same time requiring understanding of the code and some aspect of Turbo-Geth/Silkworm data model. Carrying out this task will require the database obtained by fully syncing a Turbo-Geth node, which should be compatible with the silkworm code. This is because the task requires replaying all transactions from the mainnet history. Of course, for most debugging and testing, only a small subset of transaction should be replayed.

Goals

Here are 3 high-level goals:

  1. Find number of transactions that fail and consume the entire allotted gas. These are "Out of Gas" errors, hitting unrecognised instruction, making invalid jump, or explicitly calling INVALID instruction. These transactions do not make any changes in Ethereum state, except for deducting the ETH (gasprice * tx gaslimit) from the sender's balance of the transaction, and adding the same amount to the miner's balance.
  2. Think about data structure (for example, compressed bitmap) that would allow us to mark the transactions found in 1. so that we can use this data structure as one of the inputs to replay
  3. Using the data structure from 2., see if there is a benefit in skipping the execution of these transactions entirely, and how much benefits that could bring.
  4. Also using data structure from 2., see if we can simplify the configuration of EVM, by retrofitting some opcodes that did not exist from the beginning, and "pretend" that they existed from the very beginning of Ethereum in 2015. For example, bit shifting opcodes, if used prior to Byzantium hard fork where they were introduced, would produce failure of the type described in 1., and if we skip these transactions, we can simply pretend that the bit shifting opcodes were always around. Find all such opcodes and see how this can simplify the EVM configuration.

Tools

Apart from having the synced database, you will need the code that is capable of re-executing any historical transactions at will. Such code exists in Silkworm, in the silkworm/cmd/check_changes.cpp file. This command line utility can be used as a starting point, it accept parameters that specify the database, as well as range of historical block numbers for which to re-play transactions.

The way this program works is similar to how transaction replay is done in the "live" mode. The difference here is that instead of reading state items (accounts with their balances and nonces, contract storage items) from the PLAIN-CST2 table, this historical replay uses the combination of change set tables PLAIN-ACS and PLAIN-SCS, and history index tables hAT and hST to read historical state. Unlike the "live" execution, it does not modify the database and is totally read-only, so it is very good for repetitive experiments and learning. If you want to dig deeper how it works, you can refer to the file silkworm/silkworm/db/buffer.cpp, which defines the type Buffer. An instance of this type is passed into the execute function, and this buffer is used to access historical state. Going even deeper, look at the file silkworm/silkworm/db/access_layer.cpp, specifically at functions read_account and read_storage

gcolvin commented 3 years ago
  1. Find number of transactions that fail and consume the entire allotted gas.

Do we have an estimate of how many of these there are?

  1. ... see if there is a benefit in skipping the execution of these transactions entirely, and how much benefits that could bring

Which depends in part on how many there are.

  1. ... if we skip these transactions, we can simply pretend that the bit shifting opcodes were always around.

Do you mean that for any transaction in the set found in step 1 those failing by hitting an invalid opcode will (like all the others) be skipped, so the clients don't need to check for the block number to know how to handle it? If they hit that instruction it must be valid.

This sounds like a very promising line of investigation.

AlexeyAkhunov commented 3 years ago
  1. Find number of transactions that fail and consume the entire allotted gas.

Do we have an estimate of how many of these there are?

Not at the moment, but I think it is not very high. It would be much higher before Byzantium (block 4'730'000), where REVERT opcode was introduced. But on the upside, if it is not very high, the structure marking these transaction might be not very large :)

  1. ... if we skip these transactions, we can simply pretend that the bit shifting opcodes were always around.

Do you mean that for any transaction in the set found in step 1 those failing by hitting an invalid opcode will (like all the others) be skipped, so the clients don't need to check for the block number to know how to handle it? If they hit that instruction it must be valid.

This sounds like a very promising line of investigation.

Precisely, the idea is not to check the block number to handle it, and pretend it always existed.

gcolvin commented 3 years ago

(I'll note that static analysis can identify some range of contracts that are certain to fail and refuse to deploy them.)

AlexeyAkhunov commented 3 years ago

(I'll note that static analysis can identify some range of contracts that are certain to fail and refuse to deploy them.)

Yes. When our static analysis code is integrated and matured, we might be able to use it for it. Since it computes Control Flow Graph, it can check whether the control flow inevitably terminates in failure

gcolvin commented 3 years ago

So if we can skip failing transactions then

... any transaction failing by hitting an invalid opcode will be skipped, so the clients don't need to check for the block number. If they hit that instruction it must be valid.

That means simpler clients, even if doesn't save much execution time.

tjayrush commented 3 years ago

As part of my previous work, we did exactly the calc you're describing in part 1. I can give you a pretty accurate count of both pre-Byzantium and post-Byzantium. Our code first checks to see if gas == gasUsed and then, if so, pulls all traces (using Parity) and checks for any errors. We do this only for pre-Byzantium because we can just check the status code afterwards, but I can extend it to include post-Byzantium.

I'll post numbers here shortly.

Also, I'm very interested in the code you mention above. I've been experimenting with silkworm and lmdb, so maybe I'll try to match the numbers produced by Parity as a learning experience.

tjayrush commented 3 years ago

I started a branch to work on this here: https://github.com/torquem-ch/silkworm/tree/078-tx-analysis.

I used TrueBlocks as a double check of the results. This pseudo-code:

for every block
    countBlocks++
    for every transaction in block
        countTxs++
        if (gas == receipt.gasUsed && traces contain error)
            countErrors++

produces these results:

from to txs errors
0 50,000 1,871 10
50,000 100,000 25,099 637
100,000 150,000 36,365 75
150,000 200,000 58,458 113
200,000 250,000 58,219 71
250,000 300,000 61,372 107
300,000 350,000 63,483 75

This will give me a baseline to try to duplicate.

tjayrush commented 3 years ago

Preliminary data on counting errors: https://docs.google.com/spreadsheets/d/13omV5sfpKa6dI-2nB7PgbHEqtKvymCpdcZifv1TlDlU/edit?usp=sharing.

I'll update with more results when available.

yperbasis commented 3 years ago

@AlexeyAkhunov @tjayrush Is it still an issue or can we close it?