foundry-rs / foundry

Foundry is a blazing fast, portable and modular toolkit for Ethereum application development written in Rust.
https://getfoundry.sh
Apache License 2.0
8.27k stars 1.74k forks source link

perf(`forge test`): memory leak in latest foundry #8014

Closed 0xTimepunk closed 1 month ago

0xTimepunk commented 5 months ago

Component

Forge

Have you ensured that all of these are up to date?

What version of Foundry are you on?

forge 0.2.0 (5494c33 2024-05-30T00:18:10.251370000Z)

What command(s) is the bug in?

forge test (make test)

Operating System

macOS (Apple Silicon)

Describe the bug

With the following PR as an example https://github.com/superform-xyz/superform-core/pull/541

To reproduce

Part 1

code 137 in https://github.com/superform-xyz/superform-core/actions/runs/9289996043/job/25566018918

Please note that this is a powerful dedicated Github actions runner, so the memory limits shouldn't actually be reached

Part 2

A whopping difference of 36gb.

mattsse commented 5 months ago

@grandizzy I wonder if this is fuzzing related

@0xTimepunk were you able to narrow this down to a specific test?

grandizzy commented 5 months ago

@grandizzy I wonder if this is fuzzing related

@0xTimepunk were you able to narrow this down to a specific test?

Invariants are excluded from project tests, so they're not the culprit. Going to check for regular fuzz tests, the build that does not reproduce it is the one right before cancun update (project is set to paris) but don't have any evidence this causes it.

0xTimepunk commented 5 months ago

I'm not sure if the nightly I posted is the last well operating nightly as I haven't tested. Memory usage increase seems independent of test type, however we do have fuzzying in some scenario files which are part of make ftest or make coverage

grandizzy commented 5 months ago

I'm not sure if the nightly I posted is the last well operating nightly as I haven't tested. Memory usage increase seems independent of test type, however we do have fuzzying in some scenario files which are part of make ftest or make coverage

@0xTimepunk I was under the impression you're excluding invariant tests when you hit such (here's the CI failure I was pointing to check https://github.com/superform-xyz/superform-core/actions/runs/9289996043/job/25566018918 ), but looking at make ftest target it is running all of them, is there same issue when running make test-ci ? I am yet able to try it locally as the project requires several chain rpc urls which I don't have handy The failure in CI action above relates to coverage only which has script = 'src' in FOUNDRY_PROFILE=coverage, is this desired? Comparing with a success run (like https://github.com/superform-xyz/superform-core/actions/runs/9302286062/job/25603031010), in the failed one (https://github.com/superform-xyz/superform-core/actions/runs/9289996043/job/25566018918 ) there's no [FAIL. Reason: setup failed: Could not instantiate forked environment with fork url: ***] setUp() (gas: 0) entry while in the one that completed OK there are several such, which can indicate that the successful runs are not really executing the tests, hence passing

0xTimepunk commented 5 months ago

So, to give extra context. I thought using foundryup -v nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970 fixed things, but the memory leak is still there apparently

I think this is independent of invariant tests, because it happens also with make test-ci and make coverage, both exclude invariant tests.

I think it may have to do with fuzzing. Please note that our fuzz tests involve running actions repeatedly between 2 or more forks with different input parameters. The scenario / assertions are quite large ,but this didn't happen before.


on make ftest that was just an example command as this is also happening for make test-ci and make coverage


The failure in CI action above relates to coverage only which has script = 'src' in FOUNDRY_PROFILE=coverage, is this desired?

This is desired. This forces script folder not to be built as it's not needed for coverage run

@grandizzy

0xTimepunk commented 5 months ago
image

I notice 1gb is added on each fuzz test we run (like, when the line appears as [PASS], I see 1gb +- added to the forge process in mac's activity monitor), when it should be stable. This is on make test-ci which has no invariant calls

0xTimepunk commented 5 months ago

@mattsse @grandizzy (also please read conclusions at the bottom)

Scenario 1 - the worst (with all latest versions)

Settings

On forge-std @52715a2 (latest)

latest foundry (foundryup)

Note: this checks out to a commit with latest forge-std and removed import of ds-test

git checkout 0769ff1 && make test-ci

image

Results: Forge hits around 11gb. More has been observed

git checkout 0769ff1 && make coverage

image

Results: Forge hits around +32gb. More has been observed Computer starts to freeze, 137 code sometimes both on local / ci

Scenario 2 - very bad

Settings

On forge-std @52715a2 (latest)

nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970

Note: this checks out to a commit with latest forge-std and removed import of ds-test

git checkout 0769ff1 && forge install && make test-ci

image

Results: Forge hits around 8gb usage. More has been observed in other runs which were not snapshoted.

git checkout 0769ff1 && forge install && make coverage

image

Results: Forge hits around 21gb of usage. More has been observed in other runs which were not snapshoted.

Scenario 3 - not possible

Settings

On forge-std @52715a2 (latest)

nightly-f625d0fa7c51e65b4bf1e8f7931cd1c6e2e285e9

Note: this checks out to a commit with latest forge-std and removed import of ds-test

git checkout 0769ff1 && forge install && make test-ci

Crashes (likely too old foundry version for most recent forge-std), so skipping analysis here.

image

Scenario 4 - not possible

Settings

On forge-std@2f11269 (6 months old, the one we had previously)

AND Using ds-test@e282159

latest foundry (foundryup)

Note: this checks out to a commit which has our previous forge-std version, the one from 6 months ago, together with ds-test, both still operational

git checkout e9826a51 && forge install && make test-ci

Crashes. Likely new forge version did something that made it stop working with forge-std (unsure why should this happen). Skipping test.

image

Scenario 5 - somewhat healthy ram usage

On forge-std@2f11269 (6 months old, the one we had previously)

AND Using ds-test@e282159

nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970

Note: this checks out to a commit which has our previous forge-std version, the one from 6 months ago, together with ds-test, both still operational

git checkout e9826a51 && forge install && make test-ci

image

Results: 2gb. looks healthy

git checkout e9826a51 && forge install && make test-coverage

image

Results: 15gb. Worse than scenario 6, 3 times more gb usage, definitely something started to happen with forge between version nightly-f625d0fa7c51e65b4bf1e8f7931cd1c6e2e285e9 (check below) and nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970(this one). Still better than with new forge-std in scenario 2.

Scenario 6 - the healthiest RAM usage

Settings

On forge-std@2f11269 (6 months old, the one we had previously)

AND Using ds-test@e282159

nightly-f625d0fa7c51e65b4bf1e8f7931cd1c6e2e285e9

Note: this checks out to a commit which has our previous forge-std version, the one from 6 months ago, together with ds-test, both still operational

git checkout e9826a51 && forge install && make test-ci

image

Results: Forge hits around 1.79gb usage

git checkout e9826a51 && forge install && make coverage

image

Results: Forge hits around 5gb usage


Conclusions

0xTimepunk commented 5 months ago

This issue is extra tricky to solve in a temporary fix because using latest forge allowed to overcome the problem (totally unrelated) in https://github.com/foundry-rs/foundry/issues/7852 (which was fixed this month), and by having to use an old one we'll probably need to maintain two forge versions to continue operating. (Or perhaps try to use https://github.com/foundry-rs/foundry/commit/a87faf6a725ab4327f8617bc56a7f85671bc52d8 associated nightly which has the fix and the memory leak is not so bad, together with old forge-std + ds-test)

Edit: the nightly associated https://github.com/foundry-rs/foundry/commit/a87faf6a725ab4327f8617bc56a7f85671bc52d8 has the memory leak, so we will operate with two foundry versions as needed for now

0xTimepunk commented 5 months ago

Update:

While this works on our main/develop branches (scenario 6, forge-std 1.7.3 and forge f625d0f), this breaks something in an upstream branch where tests on mainnet are pinned to a more recent block compared to main/develop. In this block a USDC upgrade was already made, which only works with forge-std 1.8.0+ (see release notes of https://github.com/foundry-rs/forge-std/releases/tag/v1.8.0 and https://github.com/foundry-rs/forge-std/pull/505)

Some testing

Testing forge-std 1.8.0 with f625d0f does not work due to the crash in scenario 3 above.

Then I upgraded again foundry to cafc2606a2187a42b236df4aa65f4e8cdfcea970.

Testing once again (1.8.0 + cafc260) to see if the memory leak is there by running

image

Results: 13gb, the memory leak is back

Conclusion: the memory leak seems to be definitely have been brought up by forge-std in versions >=1.8.0 crossover with forge.

for awareness @mattsse @grandizzy and tagging @mds1

There is an urgency in solving this situation as this is almost completely halting our development. We need more advice how we can short term fix this to put down the memory leak and have the USDC fix brought up by 1.8.0 in place (is there a commit we can cherry pick for this)

grandizzy commented 5 months ago

@0xTimepunk running with trace enabled I can see mem grows when test forks chains (there are many tests with many chains forked running at the same time so eventually OS kicks in and kills the task). What worked for me locally (and I think you can apply it for now as an workaround too to get you going) is to split the scenarios test in several steps, as in

test-scenarios :; forge test --match-path "test/fuzz/scenarios/scenarios-deposit-multiDst*/*.sol" --no-match-path "test/invariant/**/*.sol" \
&& forge test --match-path "test/fuzz/scenarios/scenarios-deposit-singleDirect*/*.sol" --no-match-path "test/invariant/**/*.sol" ...

so mem doesn't ever grow up to the point where process needs to be killed (this also helped me not to get rate limited by infura)

Looking more into this issue but I think you can use such for now?

0xTimepunk commented 5 months ago

@0xTimepunk running with trace enabled I can see mem grows when test forks chains (there are many tests with many chains forked running at the same time so eventually OS kicks in and kills the task). What worked for me locally (and I think you can apply it for now as an workaround too to get you going) is to split the scenarios test in several steps, as in


test-scenarios :; forge test --match-path "test/fuzz/scenarios/scenarios-deposit-multiDst*/*.sol" --no-match-path "test/invariant/**/*.sol" \

&& forge test --match-path "test/fuzz/scenarios/scenarios-deposit-singleDirect*/*.sol" --no-match-path "test/invariant/**/*.sol" ...

so mem doesn't ever grow up to the point where process needs to be killed (this also helped me not to get rate limited by infura)

Looking more into this issue but I think you can use such for now?

Hey @grandizzy it's super cumbersome to run a large test suite like ours like that suggested way unfortunately. Thinking on using pre 1.8.0 forge-std plus a temporary fix to avoid dealing USDC

is there any further help we can give on this problem?

Thanks

grandizzy commented 5 months ago

yeah, I'm actively looking into this, just provided some thoughts while debugging

0xTimepunk commented 4 months ago

@mattsse @grandizzy requesting to re-open the issue as something got merged around 2 days ago or more into foundry that made the issue appear once again

Ci exits once again with error 137 (out of memory)

Running make coverage on https://github.com/superform-xyz/superform-core/pull/516 yield 40gb of ram

telegram-cloud-photo-size-4-6030658894684078297-y

This is blocking our test/coverage checks mainly on CI (because it auto-kills with 137) and impeding progress

mattsse commented 4 months ago

@klkvr @grandizzy does this ring a bell?

0xTimepunk commented 4 months ago

forge 0.2.0 (fb86e5d 2024-06-05T00:17:33.761343000Z) was one of the first versions which contained the initial fix to the memory issue which was working. However, given that the nightlies are removed after 3 days we can no longer access this nightly in the CI.

You can probably use this version to diff check against current one to see what changed

grandizzy commented 4 months ago

@klkvr @grandizzy does this ring a bell?

nothing I could think at but will do some tests shortly and try figure out if something changed to affect this. @0xTimepunk to rule out the possibility of your PR 516 exposing a new issue (as it adds new code and tests), could you please check if you see same coverage problem running latest forge against your master (or v1) branch?

(Also, if I correctly understand the issue now is with forge coverage, initially the failure described in this issue was with forge test, so pretty sure that's not caused by the same)

grandizzy commented 4 months ago

I am not able to locally spot any difference between June 5 forge build mentioned as working OK (https://github.com/foundry-rs/foundry/commit/fb86e5d3bf41f9cef3ccc7fbd04e1a422f20c29e) and latest (https://github.com/foundry-rs/foundry/commit/fb86e5d3bf41f9cef3ccc7fbd04e1a422f20c29e) - when running make coverage on https://github.com/superform-xyz/superform-core/commit/68320ffd7190db2abb9b32d5fdf4912bf04275be commit both runs cap around 25G of res mem and completes succcessfully

same when testing v1 branch with both forge versions so I am pretty sure there's no regression. @0xTimepunk maybe there's an issue with the RPC URLs used by your CI that could somehow slow down tests / make mem accumulate?

Inphi commented 4 months ago

I'm seeing a variant of this when running forge test --debug. Though happy to create a separate issue if this is something else. Here's a small reproducer:

Foundry Version

forge 0.2.0 (c4a984fbf 2024-06-28T00:53:16.491646468Z)

Reproducer

// SPDX-License-Identifier: UNLICENSED
pragma solidity ^0.8.13;

import {Test, console} from "forge-std/Test.sol";
import {Counter} from "../src/Counter.sol";

contract CounterTest is Test {
    Counter public counter;
    address addr = address(uint160(uint256(keccak256("ekans"))));

    function setUp() public {
        bytes memory buffer = new bytes(2000000);
        vm.etch(address(addr), buffer);

        counter = new Counter();
        counter.setNumber(0);
    }

    function test_Increment() public {
        counter.increment();
        assertEq(counter.number(), 1);
    }

    function testFuzz_SetNumber(uint256 x) public {
        counter.setNumber(x);
        assertEq(counter.number(), x);
    }
}

Running forge test --debug test_Increment --mc CounterTest consumes over 32 GB on my machine before the kernel kills it. I think the critical part is this line here:

vm.etch(address(addr), buffer)

Etching a really large runtime code seems to trigger the issue. I use an empty buffer for illustrative purposes, but the run still consumes alot of memory even when a valid runtime bytecode is used. Reducing the size of the runtime code seems to work, with minimal memory usage. I'm sure there's some buffer size threshold where memory usage spikes, but I haven't narrowed it down.

grandizzy commented 4 months ago

@Inphi yes, please open a new issue for etch, going to use this one to optimize coverage report creation. Thank you.

Inphi commented 4 months ago

@Inphi yes, please open a new issue for etch, going to use this one to optimize coverage report creation. Thank you.

Created an issue here.

Seems related to this issue based on this comment. I imagine that the Debugger inspector is also used to track coverage which would explain the memory blowup.

grandizzy commented 2 months ago

@0xTimepunk can we now close this issue as if I correctly remember it's not the same as original report but a forge coverage high mem usage? Pls open a new issue for coverage if problem still exists. Thank you!

grandizzy commented 1 month ago

@0xTimepunk going to close this for now per above comment, please open a new one if still issues. Thank you!

56789KD commented 1 week ago

View a chang comment.