perf(`anvil`): enhance block mining performance in Anvil node for high throughput and efficiency

mshakeg commented 5 months ago

Component

Anvil

Describe the feature you would like

I propose a performance enhancement for the Anvil node, specifically targeting the efficiency of block mining. Through some tests I've observed that while Anvil demonstrates impressive transaction processing capabilities, there's a noticeable disparity in throughput efficiency primarily attributed to the time spent mining blocks. This feature request seeks optimizations in Anvil's block mining to reduce execution time, thereby increasing the overall transactions per second (TPS) throughput and making the node more suitable for applications requiring high transaction processing speeds as well as frequent mining of blocks.

Additional context

Anvil version: 0.2.0 (2cf84d9 2024-02-07T00:15:49.622159000Z)

To illustrate the current performance characteristics and provide a basis for this request, I conducted a test using a Uniswap V3 transaction replay script. The findings highlight a significant potential for performance gains in block mining processes. For instance, when increasing the nullSwapsPerBlock from 1 to 2000, the average TPS improved dramatically(by a factor of 7x), indicating that the node spends a significant portion of time mining blocks vs actual transaction execution. To replicate this test:

clone this repo anvil-backtester, install deps(pnpm i)
start the anvil node: pnpm anvil:start
run the test script: pnpm test:anvil-memory with nullSwapsPerBlock set to 1 and then again set to 2000 and observe results similar to the following indicating significant overhead in mining blocks:

{
  blocksToMine: 25,
  nullSwapsPerBlock: 1,
  totalTxs: 50,
  executionTime: 0.084,
  averageTPS: 595.2380952380952,
  averageTimePerTx: 1.6800000000000002
}

{
  blocksToMine: 25,
  nullSwapsPerBlock: 2000,
  totalTxs: 100000,
  executionTime: 24.747,
  averageTPS: 4040.8938457186728,
  averageTimePerTx: 0.24747000000000002
}

mattsse commented 5 months ago

it likely spends most of the time cleaning up / updating old state

could you try with --prune-history if you notice any difference?

There's definitely room for significant improvements here

mshakeg commented 5 months ago

@mattsse I am using --prune-history in the anvil command as shown below

https://github.com/mshakeg/anvil-backtester/blob/main/shell/anvil.sh

Removing --prune-history and --transaction-block-keeper 4 from the above command does not result in any noticeable changes in performance.

mattsse commented 5 months ago

hmm, could you perhaps run this with samply https://github.com/mstange/samply and see if anything sticks out

I'll try to investigate shortly

mshakeg commented 5 months ago

@mattsse thanks, don't really know what to make of the profile, but I've attached the trace on evm_mine, maybe GPT4 could be a source of inspiration :)

Based on this call trace, here are a few points to consider for profiling and improving performance:

Database Interactions: The evm_mine operation involves interactions with an in-memory database. Optimizations here could involve reducing the number of reads and writes, caching frequently accessed data, or improving the database's data structures.

State Trie Manipulation: There are multiple calls to trie_db functions, which indicate manipulation of the state trie. This is an area that typically has a significant impact on performance. Optimizing trie algorithms or using a more efficient trie structure could yield performance improvements.

Hash Calculations: The keccak_hasher and tiny_keccak functions suggest that Keccak hashing is part of the operation. Optimizing hashing or reducing the number of hash calculations required could improve performance.

EVM Execution: The revm specific calls such as run_interpreter and preverified_inner imply that EVM bytecode execution is a part of the process. Profiling the EVM's interpreter loop, opcode execution, and context switching could reveal bottlenecks.

Smart Contract Calls: Calls to inspect_call_instruction and Host::call suggest that smart contract function calls are being made. Optimizing the way smart contracts are called and executed, possibly by reducing the overhead of call setup and teardown, could improve performance. This could include minimizing the overhead associated with setting up the environment for a contract call and efficiently handling the stack and memory operations.

Parallelism and Concurrency: Evaluate if any parts of the evm_mine process can be executed in parallel. Some operations, especially state-independent ones, may benefit from concurrent execution.

Memory Management: Functions like drop_in_place suggest that there is active management of memory, possibly with data structures being de-allocated. Improving memory allocation strategies, avoiding unnecessary allocations, and reusing memory buffers could reduce overhead and improve performance.

Opcode Optimization: Within the EVM execution, certain opcodes may be used more frequently or may be more resource-intensive. Profiling at the opcode level could help identify if specific opcodes are bottlenecks and could be optimized.

Caching Strategies: For repetitive operations, especially within the EVM interpreter, caching results of expensive computations could be beneficial if they're likely to be repeated with the same inputs.

Profiling and Instrumentation Tools: Utilize profiling tools that can provide granular insights into CPU and memory usage. Rust's performance tools, such as perf on Linux or DTrace/BPF on BSD/Mac, can help identify hot paths and functions that are taking the most time or consuming the most resources.

Algorithmic Efficiency: Review the algorithms used in the trie manipulation and hashing to ensure they are the most efficient for the use case. Sometimes, algorithmic improvements can yield better performance gains than low-level optimizations.

Code Review and Refactoring: There might be opportunities to refactor the code for efficiency. This could involve combining functions, inlining functions to reduce call overhead, or simplifying complex logic.

Batch Processing: If the evm_mine operation can be batched (i.e., processing multiple transactions or blocks in a single operation), it could reduce the per-operation overhead and take advantage of more efficient bulk processing techniques.

Asynchronous Processing: Look into asynchronous processing where applicable to avoid blocking operations, particularly for I/O bound tasks.

mattsse commented 5 months ago

thanks!

will investigate, but looks like stateroot

mshakeg commented 5 months ago

@mattsse thanks, might be a good idea to have flags that disable logic not really needed on a local node, similar to how the eth_sendUnsignedTransaction method can be used to send an unsigned transaction.

zerosnacks commented 1 week ago

Relevant conversation in #7546: https://github.com/foundry-rs/foundry/pull/7546#issuecomment-2041338137

foundry-rs / foundry