`forge script` stuck in pending status indefinitely

dtbuchholz commented 2 weeks ago

Component

Forge

Have you ensured that all of these are up to date?

[X] Foundry
[X] Foundryup

What version of Foundry are you on?

0.2.0 (9d74675 2024-10-31T00:21:25.844296000Z)

What command(s) is the bug in?

forge script

Operating System

macOS (Apple Silicon)

Describe the bug

when trying to deploy contracts with forge script, they become stuck indefinitely, causing the script to never complete (e.g., many hours without resolving). for example, when deploying to an anvil node, the logs will sit here forever:

⠈ Sequence #1 on 4362550583360910 | Waiting for pending transactions
    ⠠ [Pending] 0xdf7627fa2d61515e7cf89eb175ee925da67c85826663f9fed4f131e8d61c6eaa
    ⢀ [Pending] 0xa1dd30eea38df6bdb9790c4e9fa58a93eeeb5882ee9a0c16894ba8f7cb162895
    ⢀ [Pending] 0x4ea3c8c6b7099fb9c3cb70a5b3947d0f742d6dace434d06275189fe4ba99232a
    ⠠ [Pending] 0xb0a6c74a5f666ed040886381704734851ef3a1a2693abb9004cf9587267c58d1

similarly, when deploying to filecoin calibration, the same thing happens—i.e., it's a bug irrespective of the target chain:

⠂ Sequence #1 on filecoin-calibration-testnet | Waiting for pending transactions
    ⠂ [Pending] 0xa3adcc8b1aa574ace5dca128673541010b5609172a7be8e11f082e0903ea2cc4
    ⠐ [Pending] 0xaf1db23686e1326428726420696df9cda381d2e2b3539b4b959ce20786f267a1

you can see that all of the txs in the snippet above have settled (e.g., here), but the script never exited.

this bug appeared starting in nightly-b1e93654348a0f31effa34790adae18865b14aa8 and still exits in the latest nightly-9d74675bae8bfbd83428ff1343cbe2ae206c3383. if I downgrade to nightly-2044faec64f99a21f0e5f0094458a973612d0712, the bug no longer happens, and everything works as expected.

this was tested on both apple silicon and remote linux machines; nightly-2044faec64f99a21f0e5f0094458a973612d0712 works, but anything thereafter fails.

dtbuchholz commented 2 weeks ago

for reference, this is the commit that broke things, which changes the formatter: https://github.com/foundry-rs/foundry/commit/b1e93654348a0f31effa34790adae18865b14aa8

grandizzy commented 2 weeks ago

thanks @dtbuchholz , will check! Were you able to test that it works without that commit, it shouldn't change the current formatters but should only add a new variation. Can you point the difference of src formatting you see?

grandizzy commented 2 weeks ago

Would it be possible to provide a way to reproduce including anvil startup command and the script used / command? Thanks

dtbuchholz commented 2 weeks ago

Would it be possible to provide a way to reproduce including anvil startup command and the script used / command? Thanks

@grandizzy the code is currently in a private repo, but I tested it with a boilerplate erc20 contract. the command I'm running is just the typical forge script with params pointing to a dockerized anvil node. if I deploy to filecoin calibration, it looks something like this: forge script ... --rpc-url https://rpc.ankr.com/filecoin_testnet --broadcast --timeout 360 -g 100000 (note: filecoin has odd gas multiplier requirements and 30 sec block times). this will hang forever.

but, i did just notice some interesting behavior. if i try to deploy the contract to an anvil node running on my machine (e.g., anvil --port 8888), the script works and contracts deploy w/o hanging. if i point to my dockerized anvil node, it runs into this indefinite pending status. the dockerized anvil node is using the ghcr.io/foundry-rs/foundry:latest, so perhaps that's part of the problem. however, given the indefinite hanging still happens while deploying to non-anvil / filecoin calibration, I'm not sure what's happening.

e.g., if I use the latest nightly build on my machine and use forge script to interact with anvil from ghcr.io/foundry-rs/foundry:latest, I can see this error in our test infra's logging:

Error: Failure on receiving a receipt for 0x036a652382b17df27aebafc1f0579d112e5365be323c57cea1d85c04b40ab4f9:
deserialization error: duplicate field `status` at line 1 column 1577

dtbuchholz commented 2 weeks ago

Were you able to test that it works without that commit, it shouldn't change the current formatters but should only add a new variation. Can you point the difference of src formatting you see?

i just tried dropping the b1e936543 commit, and it didn't seem to do the trick, actually. it's odd because things work if I checkout 2044faec6. the problem must be from one of these, which are in between 2044faec6 and b1e936543:

4d7435e64 feat(`anvil`): support mining with same block.timestamp (#9160)
9252e98bd chore: format chained error for EvmError (#9169)
3b2e57a29 Add debug file dump (#7375)
7b118faef chore(deps): bumps alloy, revm, fork-db (#9150)
cd71da404 feat: add `foundry_common::shell` to unify log behavior (#9109)
2cdf718ef chore: refactor debugger dump code (#9170)
4c84dc7d9 fix(anvil): Apply state overrides in debug_traceCall (#9172)

grandizzy commented 2 weeks ago

thank you, will try to reproduce including anvil dockerized env. How many txes you fire up from script?

dtbuchholz commented 2 weeks ago

@grandizzy cool, sg, thanks! and it doesnt seem to matter how many txs. I've tried with up to 4, but in my MVP example while deploying to the dockerized anvil, it hangs here with just 1:

⠠ Sequence #1 on anvil-hardhat | Waiting for pending transactions
    ⡀ [Pending] 0xcfeed6eacad87be5fda6108b3c6f4945645072e2d61db287072923da29c559ce
    ⠤ [00:02:56] [#################################################################################] 1/1 txes (0.0s)
    ⠤ [00:02:56] [-----------------------------------------------------------------------------] 0/1 receipts (0.0s)

and just for posterity, it sounds like these are the steps to reproduce:

install the nightly build on your machine: foundryup
pull the "latest" docker image at ghcr.io/foundry-rs/foundry:latest and spin up a container running anvil
deploy an arbitrary contract to the container with forge script from your machine

and this behavior seems to showcase the same thing that happens for filecoin calibration

grandizzy commented 2 weeks ago

@dtbuchholz just to make sure, you see the dockerized issue on both apple/Linux, is this right? Asking because we don't have builds for apple which results in poor performance, tracked in https://github.com/foundry-rs/foundry/issues/8039

dtbuchholz commented 2 weeks ago

@grandizzy the "indefinite pending" issue exists on both linux/apple. to clarify:

wrt dockerized anvil—I have only tested the nightly forge script deploying to the "latest" dockerized anvil setup on an apple machine. i haven't done this on linux.
but, the indefinite pending status does happen for both linux/apple when using the nightly forge script pointing to filecoin calibration. i.e., our remote infra uses linux machines to deploy contracts to filecoin calibration, not anvil.

put differently, myself and 3 others from my team are seeing the same issue on our apple machines, plus, our remote linux infra. these only resolve if we rollback to 2044faec6.

fwiw, i force grabbing the image on my macos via docker pull --platform linux/x86_64 ..., and i haven't ran into issues (even if it's not recommended).

grandizzy commented 2 weeks ago

Thank you @dtbuchholz , I was able to reproduce the failure with dockerized version, the problem seems to be that the image wasn't built/published for a while

REPOSITORY                                            TAG       IMAGE ID       CREATED        SIZE
ghcr.io/foundry-rs/foundry                            latest    e5c8015b4c70   2 weeks ago    212MB

so when you use a version of forge with updated alloy/revm it (updated in nightly-b1e93654348a0f31effa34790adae18865b14aa8) it fails with deserialization error: duplicate field status

If you use the nightly tag instead latest this should work OK:

REPOSITORY                                            TAG       IMAGE ID       CREATED        SIZE
ghcr.io/foundry-rs/foundry                            nightly   a5685c29d08c   8 hours ago    212MB

posting the docker-compose.yml file I am using for the records, started as docker-compose up anvil

services:
  anvil:
    image: ghcr.io/foundry-rs/foundry:nightly
    container_name: anvil
    environment:
      ANVIL_IP_ADDR: "0.0.0.0"
    working_dir: /anvil
    ports:
    - "8545:8545"
    command: anvil

dtbuchholz commented 2 weeks ago

@grandizzy thanks, i'll use that newer image in our testing infra setup!

however, that doesn't fully solve the problem, though. the reason we're reproducing the issue with the dockerized anvil is because the indefinite pending bug happens in this environment as well as a live testnet. so, my intuition is that if we solve the problem while deploying to ghcr.io/foundry-rs/foundry:latest, then it'll also solve the problem while deploying to a live filecoin calibration testnet (but I could be wrong). i.e., switching to the foundry:nightly image is only half the problem; it resolves our local testing use case, but it does not solve for the live deployment use case.

below is a demo of my experience. you'll see I'm initially using the latest nightly build with forge, which fails when deploying to the testnet. i then switch to 2044faec, and the issue goes away. i think it'd be useful to recreate this, specifically, while deploying to filecoin calibration and not the dockerized anvil:

get filecoin calibration testnet funds via https://faucet.calibnet.chainsafe-fil.io/funds.html
deploy some arbitrary contract, and you can use params like this: forge script ... --rpc-url https://rpc.ankr.com/filecoin_testnet --broadcast --timeout 360 -g 100000

https://github.com/user-attachments/assets/0693bcfc-d1c5-44bc-8a14-7b53e3a379c0

thus, it means one of these commits is the likely culprit:

4d7435e64 feat(`anvil`): support mining with same block.timestamp (#9160)
9252e98bd chore: format chained error for EvmError (#9169)
3b2e57a29 Add debug file dump (#7375)
7b118faef chore(deps): bumps alloy, revm, fork-db (#9150)
cd71da404 feat: add `foundry_common::shell` to unify log behavior (#9109)
2cdf718ef chore: refactor debugger dump code (#9170)
4c84dc7d9 fix(anvil): Apply state overrides in debug_traceCall (#9172)

dtbuchholz commented 2 weeks ago

one thing to consider: filecoin lags wrt EVM compatibility. so, I'm guessing there's something "new" in alloy/revm that makes things incompatible with filecoin—and that incompatibility is the ~same as ghcr.io/foundry-rs/foundry:latest

grandizzy commented 2 weeks ago

@dtbuchholz yep, indeed, next step is to try reproduce / debug on filecoin, just taking them one by one :) thanks for detailed steps!

dtbuchholz commented 2 weeks ago

ah gotcha, lol. okay awesome, appreciate the help!

zerosnacks commented 2 weeks ago

Likely unrelated but we did just land a fix in master for a deadlock introduced in cd71da404 (common shell) that occured when nesting sh_println!

dtbuchholz commented 2 weeks ago

Likely unrelated but we did just land a fix in master for a deadlock introduced in cd71da404 (common shell) that occured when nesting sh_println!

I just tried building from the latest on master and deploying contracts to filecoin...no luck, unfortunately.

klkvr commented 2 weeks ago

It seems that the root issue might be deserialization error on alloy side:

cast receipt 0x2c2deb6447610a516c0faab20ae3792b6c7386a2842f90af1fec810626f92a4a --rpc-url https://rpc.ankr.com/filecoin_testnet
Error: deserialization error: duplicate field `status` at line 1 column 1024

RPC responds with "root":"0x0000000000000000000000000000000000000000000000000000000000000000","status":"0x0" which is currently not being correctly handled here: https://github.com/alloy-rs/alloy/blob/18da1321d3dad76358c96ce1cc0cd7f0914f30b2/crates/consensus/src/receipt/receipts.rs#L19

klkvr commented 2 weeks ago

confirming that https://github.com/alloy-rs/alloy/pull/1608 fixes it, can keep this open until we migrate to alloy version including the fix

grandizzy commented 2 weeks ago

@klkvr would like to use this repro to address also the fact script just hangs instead outputting an error, as could be beneficial for other scenarios, do you think there's something we could improve?

klkvr commented 2 weeks ago

looks like we're ending up in an infinite loop here https://github.com/foundry-rs/foundry/blob/97be9b9a2e128633b17589cd58bfde4b4d544e23/crates/script/src/receipts.rs#L42-L57

eilgug commented 1 week ago

confirming that alloy-rs/alloy#1608 fixes it, can keep this open until we migrate to alloy version including the fix

I'm facing the same issue. Will this fix it, and when should the migration be done?

foundry-rs / foundry