Closed 0xTimepunk closed 1 month ago
@grandizzy I wonder if this is fuzzing related
@0xTimepunk were you able to narrow this down to a specific test?
@grandizzy I wonder if this is fuzzing related
@0xTimepunk were you able to narrow this down to a specific test?
Invariants are excluded from project tests, so they're not the culprit. Going to check for regular fuzz tests, the build that does not reproduce it is the one right before cancun update (project is set to paris) but don't have any evidence this causes it.
I'm not sure if the nightly I posted is the last well operating nightly as I haven't tested. Memory usage increase seems independent of test type, however we do have fuzzying in some scenario files which are part of make ftest or make coverage
I'm not sure if the nightly I posted is the last well operating nightly as I haven't tested. Memory usage increase seems independent of test type, however we do have fuzzying in some scenario files which are part of make ftest or make coverage
@0xTimepunk I was under the impression you're excluding invariant tests when you hit such (here's the CI failure I was pointing to check https://github.com/superform-xyz/superform-core/actions/runs/9289996043/job/25566018918 ), but looking at make ftest
target it is running all of them, is there same issue when running make test-ci
? I am yet able to try it locally as the project requires several chain rpc urls which I don't have handy
The failure in CI action above relates to coverage only which has script = 'src'
in FOUNDRY_PROFILE=coverage
, is this desired?
Comparing with a success run (like https://github.com/superform-xyz/superform-core/actions/runs/9302286062/job/25603031010), in the failed one (https://github.com/superform-xyz/superform-core/actions/runs/9289996043/job/25566018918 ) there's no [FAIL. Reason: setup failed: Could not instantiate forked environment with fork url: ***] setUp() (gas: 0)
entry while in the one that completed OK there are several such, which can indicate that the successful runs are not really executing the tests, hence passing
So, to give extra context. I thought using foundryup -v nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970
fixed things, but the memory leak is still there apparently
I think this is independent of invariant tests, because it happens also with make test-ci
and make coverage
, both exclude invariant tests.
I think it may have to do with fuzzing. Please note that our fuzz tests involve running actions repeatedly between 2 or more forks with different input parameters. The scenario / assertions are quite large ,but this didn't happen before.
on make ftest
that was just an example command as this is also happening for make test-ci
and make coverage
The failure in CI action above relates to coverage only which has script = 'src' in FOUNDRY_PROFILE=coverage, is this desired?
This is desired. This forces script folder not to be built as it's not needed for coverage run
@grandizzy
I notice 1gb is added on each fuzz test we run (like, when the line appears as [PASS], I see 1gb +- added to the forge
process in mac's activity monitor), when it should be stable. This is on make test-ci
which has no invariant calls
@mattsse @grandizzy (also please read conclusions at the bottom)
On forge-std @52715a2 (latest)
latest foundry (foundryup)
Note: this checks out to a commit with latest forge-std and removed import of ds-test
git checkout 0769ff1 && make test-ci
Results: Forge hits around 11gb. More has been observed
git checkout 0769ff1 && make coverage
Results: Forge hits around +32gb. More has been observed Computer starts to freeze, 137 code sometimes both on local / ci
On forge-std @52715a2 (latest)
nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970
Note: this checks out to a commit with latest forge-std and removed import of ds-test
git checkout 0769ff1 && forge install && make test-ci
Results: Forge hits around 8gb usage. More has been observed in other runs which were not snapshoted.
git checkout 0769ff1 && forge install && make coverage
Results: Forge hits around 21gb of usage. More has been observed in other runs which were not snapshoted.
On forge-std @52715a2 (latest)
nightly-f625d0fa7c51e65b4bf1e8f7931cd1c6e2e285e9
Note: this checks out to a commit with latest forge-std and removed import of ds-test
git checkout 0769ff1 && forge install && make test-ci
Crashes (likely too old foundry version for most recent forge-std), so skipping analysis here.
On forge-std@2f11269 (6 months old, the one we had previously)
AND Using ds-test@e282159
latest foundry (foundryup)
Note: this checks out to a commit which has our previous forge-std version, the one from 6 months ago, together with ds-test, both still operational
git checkout e9826a51 && forge install && make test-ci
Crashes. Likely new forge version did something that made it stop working with forge-std (unsure why should this happen). Skipping test.
On forge-std@2f11269 (6 months old, the one we had previously)
AND Using ds-test@e282159
nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970
Note: this checks out to a commit which has our previous forge-std version, the one from 6 months ago, together with ds-test, both still operational
git checkout e9826a51 && forge install && make test-ci
Results: 2gb. looks healthy
git checkout e9826a51 && forge install && make test-coverage
Results: 15gb. Worse than scenario 6, 3 times more gb usage, definitely something started to happen with forge between version nightly-f625d0fa7c51e65b4bf1e8f7931cd1c6e2e285e9 (check below) and nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970(this one). Still better than with new forge-std in scenario 2.
On forge-std@2f11269 (6 months old, the one we had previously)
AND Using ds-test@e282159
nightly-f625d0fa7c51e65b4bf1e8f7931cd1c6e2e285e9
Note: this checks out to a commit which has our previous forge-std version, the one from 6 months ago, together with ds-test, both still operational
git checkout e9826a51 && forge install && make test-ci
Results: Forge hits around 1.79gb usage
git checkout e9826a51 && forge install && make coverage
Results: Forge hits around 5gb usage
This issue is extra tricky to solve in a temporary fix because using latest forge allowed to overcome the problem (totally unrelated) in https://github.com/foundry-rs/foundry/issues/7852 (which was fixed this month), and by having to use an old one we'll probably need to maintain two forge versions to continue operating. (Or perhaps try to use https://github.com/foundry-rs/foundry/commit/a87faf6a725ab4327f8617bc56a7f85671bc52d8 associated nightly which has the fix and the memory leak is not so bad, together with old forge-std + ds-test)
Edit: the nightly associated https://github.com/foundry-rs/foundry/commit/a87faf6a725ab4327f8617bc56a7f85671bc52d8 has the memory leak, so we will operate with two foundry versions as needed for now
Update:
While this works on our main/develop branches (scenario 6, forge-std 1.7.3 and forge f625d0f), this breaks something in an upstream branch where tests on mainnet are pinned to a more recent block compared to main/develop. In this block a USDC upgrade was already made, which only works with forge-std 1.8.0+ (see release notes of https://github.com/foundry-rs/forge-std/releases/tag/v1.8.0 and https://github.com/foundry-rs/forge-std/pull/505)
Testing forge-std 1.8.0 with f625d0f does not work due to the crash in scenario 3 above.
Then I upgraded again foundry to cafc2606a2187a42b236df4aa65f4e8cdfcea970.
Testing once again (1.8.0 + cafc260) to see if the memory leak is there by running
Results: 13gb, the memory leak is back
Conclusion: the memory leak seems to be definitely have been brought up by forge-std in versions >=1.8.0 crossover with forge.
for awareness @mattsse @grandizzy and tagging @mds1
There is an urgency in solving this situation as this is almost completely halting our development. We need more advice how we can short term fix this to put down the memory leak and have the USDC fix brought up by 1.8.0 in place (is there a commit we can cherry pick for this)
@0xTimepunk running with trace enabled I can see mem grows when test forks chains (there are many tests with many chains forked running at the same time so eventually OS kicks in and kills the task). What worked for me locally (and I think you can apply it for now as an workaround too to get you going) is to split the scenarios
test in several steps, as in
test-scenarios :; forge test --match-path "test/fuzz/scenarios/scenarios-deposit-multiDst*/*.sol" --no-match-path "test/invariant/**/*.sol" \
&& forge test --match-path "test/fuzz/scenarios/scenarios-deposit-singleDirect*/*.sol" --no-match-path "test/invariant/**/*.sol" ...
so mem doesn't ever grow up to the point where process needs to be killed (this also helped me not to get rate limited by infura)
Looking more into this issue but I think you can use such for now?
@0xTimepunk running with trace enabled I can see mem grows when test forks chains (there are many tests with many chains forked running at the same time so eventually OS kicks in and kills the task). What worked for me locally (and I think you can apply it for now as an workaround too to get you going) is to split the
scenarios
test in several steps, as intest-scenarios :; forge test --match-path "test/fuzz/scenarios/scenarios-deposit-multiDst*/*.sol" --no-match-path "test/invariant/**/*.sol" \ && forge test --match-path "test/fuzz/scenarios/scenarios-deposit-singleDirect*/*.sol" --no-match-path "test/invariant/**/*.sol" ...
so mem doesn't ever grow up to the point where process needs to be killed (this also helped me not to get rate limited by infura)
Looking more into this issue but I think you can use such for now?
Hey @grandizzy it's super cumbersome to run a large test suite like ours like that suggested way unfortunately. Thinking on using pre 1.8.0 forge-std plus a temporary fix to avoid dealing USDC
is there any further help we can give on this problem?
Thanks
yeah, I'm actively looking into this, just provided some thoughts while debugging
@mattsse @grandizzy requesting to re-open the issue as something got merged around 2 days ago or more into foundry that made the issue appear once again
Ci exits once again with error 137 (out of memory)
Running make coverage
on https://github.com/superform-xyz/superform-core/pull/516 yield 40gb of ram
This is blocking our test/coverage checks mainly on CI (because it auto-kills with 137) and impeding progress
@klkvr @grandizzy does this ring a bell?
forge 0.2.0 (fb86e5d 2024-06-05T00:17:33.761343000Z) was one of the first versions which contained the initial fix to the memory issue which was working. However, given that the nightlies are removed after 3 days we can no longer access this nightly in the CI.
You can probably use this version to diff check against current one to see what changed
@klkvr @grandizzy does this ring a bell?
nothing I could think at but will do some tests shortly and try figure out if something changed to affect this. @0xTimepunk to rule out the possibility of your PR 516 exposing a new issue (as it adds new code and tests), could you please check if you see same coverage problem running latest forge against your master (or v1) branch?
(Also, if I correctly understand the issue now is with forge coverage, initially the failure described in this issue was with forge test, so pretty sure that's not caused by the same)
I am not able to locally spot any difference between June 5 forge build mentioned as working OK (https://github.com/foundry-rs/foundry/commit/fb86e5d3bf41f9cef3ccc7fbd04e1a422f20c29e) and latest (https://github.com/foundry-rs/foundry/commit/fb86e5d3bf41f9cef3ccc7fbd04e1a422f20c29e) - when running make coverage
on https://github.com/superform-xyz/superform-core/commit/68320ffd7190db2abb9b32d5fdf4912bf04275be commit both runs cap around 25G of res mem and completes succcessfully
same when testing v1 branch with both forge versions so I am pretty sure there's no regression. @0xTimepunk maybe there's an issue with the RPC URLs used by your CI that could somehow slow down tests / make mem accumulate?
I'm seeing a variant of this when running forge test --debug
. Though happy to create a separate issue if this is something else. Here's a small reproducer:
forge 0.2.0 (c4a984fbf 2024-06-28T00:53:16.491646468Z)
// SPDX-License-Identifier: UNLICENSED
pragma solidity ^0.8.13;
import {Test, console} from "forge-std/Test.sol";
import {Counter} from "../src/Counter.sol";
contract CounterTest is Test {
Counter public counter;
address addr = address(uint160(uint256(keccak256("ekans"))));
function setUp() public {
bytes memory buffer = new bytes(2000000);
vm.etch(address(addr), buffer);
counter = new Counter();
counter.setNumber(0);
}
function test_Increment() public {
counter.increment();
assertEq(counter.number(), 1);
}
function testFuzz_SetNumber(uint256 x) public {
counter.setNumber(x);
assertEq(counter.number(), x);
}
}
Running forge test --debug test_Increment --mc CounterTest
consumes over 32 GB on my machine before the kernel kills it.
I think the critical part is this line here:
vm.etch(address(addr), buffer)
Etching a really large runtime code seems to trigger the issue. I use an empty buffer for illustrative purposes, but the run still consumes alot of memory even when a valid runtime bytecode is used. Reducing the size of the runtime code seems to work, with minimal memory usage. I'm sure there's some buffer size threshold where memory usage spikes, but I haven't narrowed it down.
@Inphi yes, please open a new issue for etch, going to use this one to optimize coverage report creation. Thank you.
@0xTimepunk can we now close this issue as if I correctly remember it's not the same as original report but a forge coverage high mem usage? Pls open a new issue for coverage if problem still exists. Thank you!
@0xTimepunk going to close this for now per above comment, please open a new one if still issues. Thank you!
View a chang comment.
Component
Forge
Have you ensured that all of these are up to date?
What version of Foundry are you on?
forge 0.2.0 (5494c33 2024-05-30T00:18:10.251370000Z)
What command(s) is the bug in?
forge test (make test)
Operating System
macOS (Apple Silicon)
Describe the bug
With the following PR as an example https://github.com/superform-xyz/superform-core/pull/541
To reproduce
Part 1
foundryup
(latest foundry)make ftest
code 137 in https://github.com/superform-xyz/superform-core/actions/runs/9289996043/job/25566018918
Please note that this is a powerful dedicated Github actions runner, so the memory limits shouldn't actually be reached
Part 2
foundryup -v nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970
make ftest
A whopping difference of 36gb.