foundry-rs / foundry

Foundry is a blazing fast, portable and modular toolkit for Ethereum application development written in Rust.
https://getfoundry.sh
Apache License 2.0
8.24k stars 1.73k forks source link

perf(anvil): Memory consumption steadily increases during prolonged transaction replays #6017

Closed mshakeg closed 1 week ago

mshakeg commented 1 year ago

Component

Anvil

Describe the feature you would like

Description

I've been utilizing Anvil 0.2.0 (5be158b 2023-10-02T00:23:45.472182000Z) as a local Ethereum node for a Uniswap V3 backtester project. The backtester replays all transactions for a specific Uniswap V3 Pool. However, I've noticed a consistent and steady increase in memory usage over time as more transactions are replayed, even with the --prune-history flag enabled. Below is the exact command I'm using to start the anvil node:

anvil --prune-history --timestamp 1619820000 --order fifo --code-size-limit 4294967296 -m "test test test test test test test test test test test junk" --gas-limit 100000000000 --gas-price 0 --base-fee 0

Observations

  1. The memory consumption increases steadily with the replaying of transactions.
  2. When replaying transactions for backtesting, I first set the block's timestamp using evm_setNextBlockTimestamp and then utilize evm_mine to mine the block.
  3. After around 10 minutes of running, the Docker stats indicate that Anvil has consumed around 9.247GiB of memory:
de526102010a   web3-backtester-anvil-1-1        98.78%    9.247GiB / 62.01GiB   14.91%    12.3MB / 15.6MB   4.77MB / 0B       11
  1. I'm only deploying contracts at the very start and the rest(and vast majority) of the runtime is spent replaying transactions for the backtest.

Expected Behavior

Stable memory consumption or a slower, more controlled growth in memory usage over time when replaying transactions.

Possible Solutions

While I'm not certain about the root cause, I'd appreciate if the team could investigate:

  1. Whether transaction replays might be causing memory leaks or if there are specific internal data structures growing unbounded.
  2. If the --prune-history flag could be further optimized or if there's a possibility of introducing additional pruning or memory management features.

Additional context

Environment Details

mattsse commented 1 year ago

I think the offloading to disk could def use some work.

do you have some kind script that I can use to repro this by any chance? that would help, I don't think this is complex to fix, but having something available for debugging would really help here

mshakeg commented 1 year ago

Hey @mattsse thanks for the reply, the backtester is still private so I just created this repo that is a minimal reproduction of the issue.

It would be great if the anvil node had a very lightweight configuration so it drops all historical/archive state and for most cases(including mine on an MBP M1 16GB) would likely not even need to offload to disk, I thought the --prune-history flag did this, but it seems not.

mattsse commented 1 year ago

thanks! I'll take a look

mshakeg commented 1 year ago

another issue I noticed is that the anvil node slows down significantly as more and more transactions are replayed, it took about 1 hour to replay the first 50k logs and then about 5 hours to replay the next 50k logs, it'll likely only get worse on more logs.

Edit: It does get worse to replay the next 50k logs i.e. from 100-150k took about 10 hours.

mshakeg commented 1 year ago

Hi @mattsse don't want to come off as impatient, but I just wanted to ask if you could provide a rough ETA on this?

09tangriro commented 1 year ago

+1

Hi guys! We've been building a general smart contract testing framework called dojo, but we're heavily limited by the throughput anvil can handle. Like @mshakeg mentioned, we noticed unusually heavy memory usage, and especially transactions take a significant amount of time to process. At first, we also noticed transactions slowing down as the simulation went on but that was actually fixed by awaiting the transaction receipt (implying that there's some sort of queue in anvil that's getting clogged up?), but still transactions in particular take a huge amount of time to process given how few of them there actually are in a simulation. In the graph below you can see that over half of our execution time is spent waiting for anvil to process a transaction.

image

In particular, uniswap mint transactions appear to take a long time to process, presumably because it's a complex transaction (x-axis is transaction index, y-axis is seconds): image

How far do you think it would be feasible to cut this throughput time down? Chaos Labs appear to have a private fork of anvil and made a version with massively increased throughput. I'm hoping we can do the same with the open-source version.

If it's useful, me and @scheuclu are more than happy to help with this if we could get some onboarding guidance as well :)

mshakeg commented 1 year ago

@09tangriro thanks for sharing. Could you share a graph of the processing times for the uniswap mint transactions(and perhaps even for swap transactions) for the first 10,000 logs and compare how much faster it is to the later average of ~2.2s

If I had to guess it probably slows down as much as 100x; from ~20ms to 2s

09tangriro commented 1 year ago

Here's some graphs on time taken for different transactions on a sim with uniswap for a day, I'll correct myself, the trend is definitely clear that tx time is increasing: timing_contract_transact_exactOutputSingle timing_contract_transact_increaseLiquidity timing_contract_transact_mint timing_contract_transact_burn

mshakeg commented 1 year ago

@09tangriro thanks, are these scatter plots over a similar time frame/number of transactions as the initial line plot you shared? I would guess not as the transaction_mint time doesn't eventually end up around 2s. Do the times for the other transactions also end up around 2s?

09tangriro commented 1 year ago

The other plot uses a forked chain, which increases transaction times a lot. To better isolate anvil it should be noted that the scatter plots obtained recently are not run on a forked chain, but purely local development.

Ideally, I'd like for these times to be O(1) and also try to cut times in general by a factor of 10 so a transaction takes in the order of 1ms. I'm curious, on the surface of it, the Ethereum opcodes shouldn't require ms to process, it's just addition, subtraction, and memory access. According to this benchmark though, rEVM and even evmone take ms to 10s of ms to process ERC20 transactions. Does anyone know why the EVM requires so much compute power?

mshakeg commented 1 year ago

The other plot uses a forked chain, which increases transaction times a lot.

@09tangriro thanks for clarifying that.

I'd like for these times to be O(1)

I agree, I see no reason why the processing time for a transaction should increase so drastically. Your scatter plots seem to be processing only a few thousand txs, so it's not as pronounced. In any case, the plots seem to be linear so the processing time per transaction would likely eventually be much greater than the initial processing time which is unacceptable imo.

mattsse commented 1 year ago

I'm unable to run the example after adding the required env vars:

 pnpm test:anvil-memory

> @mshakeg/anvil-backtester@1.0.0 test:anvil-memory /Users/Matthias/git/rust/foundry-repros/anvil-backtester
> hardhat test test/uniswapV3/anvil-memory.spec.ts

Error HH8: There's one or more errors in your config file:

  * Invalid value undefined for HardhatConfig.networks.anvil.url - Expected a value of type string.

To learn more about Hardhat's configuration, please go to https://hardhat.org/config/

For more info go to https://hardhat.org/HH8 or run Hardhat with --show-stack-traces
 ELIFECYCLE  Command failed with exit code 1.

what's the fix here?

mshakeg commented 1 year ago

@mattsse not sure what's causing your issue, I just confirmed there's no issue following these steps:

  1. clone https://github.com/mshakeg/anvil-backtester
  2. in the anvil-backtester directory run: pnpm install
  3. copy, paste and rename .env.example to .env
  4. start the anvil node: pnpm anvil:start
  5. run the test script: pnpm test:anvil-memory

Can you make sure that ANVIL_URL="http://127.0.0.1:8545/" is defined in your .env file?

mshakeg commented 1 year ago

Hi @mattsse have you managed to come right?

gakonst commented 1 year ago

@mshakeg please try running with --prune-history --transaction-block-keeper 64.

We also would appreciate if private optimizations to Anvil were upstreamed vs kept in private :)

mshakeg commented 1 year ago

@gakonst thanks that works, the memory is still increasing but very significantly slower. I want to run a few more tests using methods evm_setNextBlockTimestamp and evm_mine and confirm that it also works fine there.

@09tangriro would appreciate it if you could also try this and share your findings, especially relating to the processing time for transactions.

09tangriro commented 1 year ago

Thanks @gakonst :)

Unfortunately wrt speed there still seems to be a positive linear trend, although arguably shallower?:

timing_contract_transact_exactInputSingle timing_contract_transact_increaseLiquidity timing_contract_transact_mint timing_contract_transact_burn

grandizzy commented 4 months ago

@mshakeg any chance you could retest this, looks similar with https://github.com/foundry-rs/foundry/issues/7940 thank you!

grandizzy commented 1 week ago

@09tangriro could you please give it another try and share results with latest anvil? mind also comment here https://github.com/foundry-rs/foundry/issues/4399#issuecomment-2407283296 what happens is that if confirmations=0 then if you use tx.wait() to wait for receipt the ethers race condition happens (I seen this with #4399 where requests were slowed down significantly due to tx.wait()), so try avoiding that or use a newer ethers version that include https://github.com/ethers-io/ethers.js/issues/4229 Would appreciate your feedback as we're looking to tackle any perf issue for a 1.0 release. thank you!

mshakeg commented 1 week ago

Hey @grandizzy I've had a chance to test on this example repo

Execution performance has improved by about 2x since I initially created this issue and doesn't degrade over time, additionally memory use & history pruning seems to be much better.

If @09tangriro agrees this issue can be closed.

09tangriro commented 1 week ago

Agreed, let's close it :)

grandizzy commented 1 week ago

Closing per above comments, thank you