Open dtbuchholz opened 2 weeks ago
for reference, this is the commit that broke things, which changes the formatter: https://github.com/foundry-rs/foundry/commit/b1e93654348a0f31effa34790adae18865b14aa8
thanks @dtbuchholz , will check! Were you able to test that it works without that commit, it shouldn't change the current formatters but should only add a new variation. Can you point the difference of src formatting you see?
Would it be possible to provide a way to reproduce including anvil startup command and the script used / command? Thanks
Would it be possible to provide a way to reproduce including anvil startup command and the script used / command? Thanks
@grandizzy the code is currently in a private repo, but I tested it with a boilerplate erc20 contract. the command I'm running is just the typical forge script
with params pointing to a dockerized anvil node. if I deploy to filecoin calibration, it looks something like this: forge script ... --rpc-url https://rpc.ankr.com/filecoin_testnet --broadcast --timeout 360 -g 100000
(note: filecoin has odd gas multiplier requirements and 30 sec block times). this will hang forever.
but, i did just notice some interesting behavior. if i try to deploy the contract to an anvil node running on my machine (e.g., anvil --port 8888
), the script works and contracts deploy w/o hanging. if i point to my dockerized anvil node, it runs into this indefinite pending status. the dockerized anvil node is using the ghcr.io/foundry-rs/foundry:latest
, so perhaps that's part of the problem. however, given the indefinite hanging still happens while deploying to non-anvil / filecoin calibration, I'm not sure what's happening.
e.g., if I use the latest nightly build on my machine and use forge script
to interact with anvil from ghcr.io/foundry-rs/foundry:latest
, I can see this error in our test infra's logging:
Error: Failure on receiving a receipt for 0x036a652382b17df27aebafc1f0579d112e5365be323c57cea1d85c04b40ab4f9:
deserialization error: duplicate field `status` at line 1 column 1577
Were you able to test that it works without that commit, it shouldn't change the current formatters but should only add a new variation. Can you point the difference of src formatting you see?
i just tried dropping the b1e936543
commit, and it didn't seem to do the trick, actually. it's odd because things work if I checkout 2044faec6
. the problem must be from one of these, which are in between 2044faec6
and b1e936543
:
4d7435e64 feat(`anvil`): support mining with same block.timestamp (#9160)
9252e98bd chore: format chained error for EvmError (#9169)
3b2e57a29 Add debug file dump (#7375)
7b118faef chore(deps): bumps alloy, revm, fork-db (#9150)
cd71da404 feat: add `foundry_common::shell` to unify log behavior (#9109)
2cdf718ef chore: refactor debugger dump code (#9170)
4c84dc7d9 fix(anvil): Apply state overrides in debug_traceCall (#9172)
thank you, will try to reproduce including anvil dockerized env. How many txes you fire up from script?
@grandizzy cool, sg, thanks! and it doesnt seem to matter how many txs. I've tried with up to 4, but in my MVP example while deploying to the dockerized anvil, it hangs here with just 1:
⠠ Sequence #1 on anvil-hardhat | Waiting for pending transactions
⡀ [Pending] 0xcfeed6eacad87be5fda6108b3c6f4945645072e2d61db287072923da29c559ce
⠤ [00:02:56] [#################################################################################] 1/1 txes (0.0s)
⠤ [00:02:56] [-----------------------------------------------------------------------------] 0/1 receipts (0.0s)
and just for posterity, it sounds like these are the steps to reproduce:
foundryup
ghcr.io/foundry-rs/foundry:latest
and spin up a container running anvil
forge script
from your machineand this behavior seems to showcase the same thing that happens for filecoin calibration
@dtbuchholz just to make sure, you see the dockerized issue on both apple/Linux, is this right? Asking because we don't have builds for apple which results in poor performance, tracked in https://github.com/foundry-rs/foundry/issues/8039
@grandizzy the "indefinite pending" issue exists on both linux/apple. to clarify:
forge script
deploying to the "latest" dockerized anvil setup on an apple machine. i haven't done this on linux.forge script
pointing to filecoin calibration. i.e., our remote infra uses linux machines to deploy contracts to filecoin calibration, not anvil.put differently, myself and 3 others from my team are seeing the same issue on our apple machines, plus, our remote linux infra. these only resolve if we rollback to 2044faec6
.
fwiw, i force grabbing the image on my macos via docker pull --platform linux/x86_64 ...
, and i haven't ran into issues (even if it's not recommended).
Thank you @dtbuchholz , I was able to reproduce the failure with dockerized version, the problem seems to be that the image wasn't built/published for a while
REPOSITORY TAG IMAGE ID CREATED SIZE
ghcr.io/foundry-rs/foundry latest e5c8015b4c70 2 weeks ago 212MB
so when you use a version of forge with updated alloy/revm it (updated in nightly-b1e93654348a0f31effa34790adae18865b14aa8
) it fails with deserialization error: duplicate field status
If you use the nightly
tag instead latest
this should work OK:
REPOSITORY TAG IMAGE ID CREATED SIZE
ghcr.io/foundry-rs/foundry nightly a5685c29d08c 8 hours ago 212MB
posting the docker-compose.yml file I am using for the records, started as docker-compose up anvil
services:
anvil:
image: ghcr.io/foundry-rs/foundry:nightly
container_name: anvil
environment:
ANVIL_IP_ADDR: "0.0.0.0"
working_dir: /anvil
ports:
- "8545:8545"
command: anvil
@grandizzy thanks, i'll use that newer image in our testing infra setup!
however, that doesn't fully solve the problem, though. the reason we're reproducing the issue with the dockerized anvil is because the indefinite pending bug happens in this environment as well as a live testnet. so, my intuition is that if we solve the problem while deploying to ghcr.io/foundry-rs/foundry:latest
, then it'll also solve the problem while deploying to a live filecoin calibration testnet (but I could be wrong). i.e., switching to the foundry:nightly
image is only half the problem; it resolves our local testing use case, but it does not solve for the live deployment use case.
below is a demo of my experience. you'll see I'm initially using the latest nightly build with forge
, which fails when deploying to the testnet. i then switch to 2044faec
, and the issue goes away. i think it'd be useful to recreate this, specifically, while deploying to filecoin calibration and not the dockerized anvil:
forge script ... --rpc-url https://rpc.ankr.com/filecoin_testnet --broadcast --timeout 360 -g 100000
https://github.com/user-attachments/assets/0693bcfc-d1c5-44bc-8a14-7b53e3a379c0
thus, it means one of these commits is the likely culprit:
4d7435e64 feat(`anvil`): support mining with same block.timestamp (#9160)
9252e98bd chore: format chained error for EvmError (#9169)
3b2e57a29 Add debug file dump (#7375)
7b118faef chore(deps): bumps alloy, revm, fork-db (#9150)
cd71da404 feat: add `foundry_common::shell` to unify log behavior (#9109)
2cdf718ef chore: refactor debugger dump code (#9170)
4c84dc7d9 fix(anvil): Apply state overrides in debug_traceCall (#9172)
one thing to consider: filecoin lags wrt EVM compatibility. so, I'm guessing there's something "new" in alloy/revm that makes things incompatible with filecoin—and that incompatibility is the ~same as ghcr.io/foundry-rs/foundry:latest
@dtbuchholz yep, indeed, next step is to try reproduce / debug on filecoin, just taking them one by one :) thanks for detailed steps!
ah gotcha, lol. okay awesome, appreciate the help!
Likely unrelated but we did just land a fix in master
for a deadlock introduced in cd71da404
(common shell) that occured when nesting sh_println!
Likely unrelated but we did just land a fix in
master
for a deadlock introduced incd71da404
(common shell) that occured when nestingsh_println!
I just tried building from the latest on master
and deploying contracts to filecoin...no luck, unfortunately.
It seems that the root issue might be deserialization error on alloy side:
cast receipt 0x2c2deb6447610a516c0faab20ae3792b6c7386a2842f90af1fec810626f92a4a --rpc-url https://rpc.ankr.com/filecoin_testnet
Error: deserialization error: duplicate field `status` at line 1 column 1024
RPC responds with "root":"0x0000000000000000000000000000000000000000000000000000000000000000","status":"0x0"
which is currently not being correctly handled here:
https://github.com/alloy-rs/alloy/blob/18da1321d3dad76358c96ce1cc0cd7f0914f30b2/crates/consensus/src/receipt/receipts.rs#L19
confirming that https://github.com/alloy-rs/alloy/pull/1608 fixes it, can keep this open until we migrate to alloy version including the fix
@klkvr would like to use this repro to address also the fact script just hangs instead outputting an error, as could be beneficial for other scenarios, do you think there's something we could improve?
looks like we're ending up in an infinite loop here https://github.com/foundry-rs/foundry/blob/97be9b9a2e128633b17589cd58bfde4b4d544e23/crates/script/src/receipts.rs#L42-L57
confirming that alloy-rs/alloy#1608 fixes it, can keep this open until we migrate to alloy version including the fix
I'm facing the same issue. Will this fix it, and when should the migration be done?
Component
Forge
Have you ensured that all of these are up to date?
What version of Foundry are you on?
0.2.0 (9d74675 2024-10-31T00:21:25.844296000Z)
What command(s) is the bug in?
forge script
Operating System
macOS (Apple Silicon)
Describe the bug
when trying to deploy contracts with
forge script
, they become stuck indefinitely, causing the script to never complete (e.g., many hours without resolving). for example, when deploying to an anvil node, the logs will sit here forever:similarly, when deploying to filecoin calibration, the same thing happens—i.e., it's a bug irrespective of the target chain:
you can see that all of the txs in the snippet above have settled (e.g., here), but the script never exited.
this bug appeared starting in
nightly-b1e93654348a0f31effa34790adae18865b14aa8
and still exits in the latestnightly-9d74675bae8bfbd83428ff1343cbe2ae206c3383
. if I downgrade tonightly-2044faec64f99a21f0e5f0094458a973612d0712
, the bug no longer happens, and everything works as expected.this was tested on both apple silicon and remote linux machines;
nightly-2044faec64f99a21f0e5f0094458a973612d0712
works, but anything thereafter fails.