ethpandaops / ethereum-package

A Kurtosis package that deploys a private, portable, and modular Ethereum devnet
MIT License
194 stars 99 forks source link

Running holesky-shadowfork fails to start #658

Closed zskamljic closed 1 week ago

zskamljic commented 1 month ago

I am trying to run a local shadowfork by using the following command:

kurtosis run --enclave holesky-shadowfork github.com/kurtosis-tech/ethereum-package --args-file holesky-shadowfork-config.yaml

Where holesky-shadowfork-config.yaml contains the following (taking this as the template:

participants:
  - el_type: besu
    el_image: hyperledger/besu:24.5.2
    cl_type: teku
    cl_image: consensys/teku:24.4
network_params:
  dencun_fork_epoch: 0
  network: holesky-shadowfork
additional_services:
  - dora
snooper_enabled: true
persistent: true

While running it seems to break in "Getting devnet enodes" with the following error:

There was an error executing Starlark code 
An error occurred executing instruction (number 24) at github.com/kurtosis-tech/ethereum-package/src/shared_utils/shared_utils.star[92:33]:
  run_python(run="\nwith open(\"/network-configs/bootnode.txt\") as bootnode_file:\n    bootnodes = []\n    for line in bootnode_file:\n        line = line.strip()\n        bootnodes.append(line)\nprint(\",\".join(bootnodes), end=\"\")\n            ", files={"/network-configs": "el_cl_genesis_data"}, wait=None, description="Getting devnet enodes")
  Caused by: Python command: "python -c \"\nwith open(\\\"/network-configs/bootnode.txt\\\") as bootnode_file:\n    bootnodes = []\n    for line in bootnode_file:\n        line = line.strip()\n        bootnodes.append(line)\nprint(\\\",\\\".join(bootnodes), end=\\\"\\\")\n            \"" exited with code 1 and output
    Traceback (most recent call last):
      File "<string>", line 2, in <module>
    FileNotFoundError: [Errno 2] No such file or directory: '/network-configs/bootnode.txt'

Error encountered running Starlark code.

but continues with some more output, implying that it's running:

⭐ us on GitHub - https://github.com/kurtosis-tech/kurtosis
INFO[2024-06-04T13:14:44+02:00] =========================================================== 
INFO[2024-06-04T13:14:44+02:00] ||          Created enclave: holesky-shadowfork          || 
INFO[2024-06-04T13:14:44+02:00] =========================================================== 
Name:            holesky-shadowfork
UUID:            d99be99ecd8e
Status:          RUNNING
Creation Time:   Tue, 04 Jun 2024 11:55:42 CEST
Flags:           

========================================= Files Artifacts =========================================
UUID           Name
6dd0d0d8baac   1-teku-besu-0-63-0
0e56c0bba085   el_cl_genesis_data
8d70dc8d740e   final-genesis-timestamp
6db33eecf284   genesis-el-cl-env-file
4685e1967b1b   genesis_validators_root
e61e669fefec   jwt_file
d47ab7258bdb   keymanager_file
a987c227438e   latest_blocks
a4e58051d851   prysm-password

========================================== User Services ==========================================
UUID           Name                                             Ports    Status
8dbe6c4f1f6a   shadowfork-el-1-besu-teku                        <none>   RUNNING
aa09eb6f442e   task-1a3f1941-ac0d-4287-bcd1-d0b6746116f2        <none>   RUNNING
2e52e8ba8faf   validator-key-generation-cl-validator-keystore   <none>   RUNNING

However in contrast to running without any config files it does not report services running with specific ports. I assume network-configs/bootnode.txt is not being generated for some reason and so rest of the setup fails.

barnabasbusa commented 1 month ago

Can you try to run again with kurtosis run --enclave holesky-shadowfork github.com/kurtosis-tech/ethereum-package@bbusa/fix-sf --args-file holesky-shadowfork-config.yaml

zskamljic commented 1 month ago

Still seeing the same issue, tried removing the enclave entirely before re-running with same failure result.

parithosh commented 3 weeks ago

I'll look into it this week and try and get a fix in

parithosh commented 2 weeks ago

Note: Nethermind should now be fixed, geth has been checked and it works too. We're talking with the besu team to check what's up with the besu shadowfork. We hope to close the topic this week.

parithosh commented 1 week ago

Most clients should work now, we closed this issue as the next bugs we found are unrelated to the original issue. Please do let us know if you face any bugs!

zskamljic commented 4 days ago

@parithosh I have tried to run it again and it would appear it does get further than it used to, however I still see some issues:

Reading genesis validators root
Command returned with exit code '0' and the following output: 0xd61ea484febacfae5298d52a2b581f3e305a51f3112a9241b968dccf019f7b11

Reading prague time from genesis
Command returned with exit code '0' and the following output: 40119820415

Adding service with name 'el-1-besu-teku' and image 'hyperledger/besu:24.5.2'
There was an error executing Starlark code 
An error occurred executing instruction (number 23) at github.com/ethpandaops/ethereum-package/src/el/besu/besu_launcher.star[141:31]:
  add_service(name="el-1-besu-teku", config=ServiceConfig(image="hyperledger/besu:24.5.2", ports={"engine-rpc": PortSpec(number=8551, transport_protocol="TCP", application_protocol=""), "metrics": PortSpec(number=9001, transport_protocol="TCP", application_protocol=""), "rpc": PortSpec(number=8545, transport_protocol="TCP", application_protocol="http"), "tcp-discovery": PortSpec(number=30303, transport_protocol="TCP", application_protocol=""), "udp-discovery": PortSpec(number=30303, transport_protocol="UDP", application_protocol=""), "ws": PortSpec(number=8546, transport_protocol="TCP", application_protocol="")}, public_ports={}, files={"/data/besu/execution-data": Directory(persistent_key="data-el-1-besu-teku", size=100000), "/jwt": "jwt_file", "/network-configs": "el_cl_genesis_data"}, entrypoint=["sh", "-c"], cmd=["besu --logging=INFO --data-path=/data/besu/execution-data --host-allowlist=* --rpc-http-enabled=true --rpc-http-host=0.0.0.0 --rpc-http-port=8545 --rpc-http-api=ADMIN,CLIQUE,ETH,NET,DEBUG,TXPOOL,ENGINE,TRACE,WEB3 --rpc-http-cors-origins=* --rpc-http-max-active-connections=300 --rpc-ws-enabled=true --rpc-ws-host=0.0.0.0 --rpc-ws-port=8546 --rpc-ws-api=ADMIN,CLIQUE,ETH,NET,DEBUG,TXPOOL,ENGINE,TRACE,WEB3 --p2p-enabled=true --p2p-host=KURTOSIS_IP_ADDR_PLACEHOLDER --p2p-port=30303 --engine-rpc-enabled=true --engine-jwt-secret=/jwt/jwtsecret --engine-host-allowlist=* --engine-rpc-port=8551 --sync-mode=FULL --data-storage-format=BONSAI --metrics-enabled=true --metrics-host=0.0.0.0 --metrics-port=9001 --min-gas-price=1000000000 --bonsai-limit-trie-logs-enabled=false --genesis-file=/network-configs/besu.json"], env_vars={"JAVA_OPTS": "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n"}, private_ip_address_placeholder="KURTOSIS_IP_ADDR_PLACEHOLDER", max_cpu=2000, min_cpu=100, max_memory=8192, min_memory=512, labels={"ethereum-package.client": "besu", "ethereum-package.client-image": "hyperledger-besu_24.5.2", "ethereum-package.client-type": "execution", "ethereum-package.connected-client": "teku", "ethereum-package.sha256": ""}, user=User(uid=0, gid=0), tolerations=[], node_selectors={}))
  Caused by: Unexpected error occurred starting service 'el-1-besu-teku'
  Caused by: An error occurred waiting for all TCP and UDP ports to be open for service 'el-1-besu-teku' with private IP '172.16.16.19'; this is usually due to a misconfiguration in the service itself, so here are the logs:
  == SERVICE 'el-1-besu-teku' LOGS ===================================
  Listening for transport dt_socket at address: 42177
  Unknown option: '--bonsai-limit-trie-logs-enabled=false'
  Possible solutions: --bootnodes, --bonsai-historical-block-limit, --bonsai-maximum-back-layers-to-load

  To display full help:
  besu [COMMAND] --help

  == FINISHED SERVICE 'el-1-besu-teku' LOGS ===================================
  Caused by: An error occurred while waiting for all TCP and UDP ports to be open
  Caused by: Unsuccessful ports check for IP '172.16.16.19' and port spec '{privatePortSpec:0xc000c75ec0}', even after '240' retries with '500' milliseconds in between retries. Timeout '2m0s' has been reached
  Caused by: An error occurred while calling network address '172.16.16.19:8545' with port protocol 'TCP' and using time out '200ms'
  Caused by: dial tcp 172.16.16.19:8545: i/o timeout

Error encountered running Starlark code.
barnabasbusa commented 4 days ago

Could you please try with latest besu image? 24.6.0

https://github.com/hyperledger/besu/releases

zskamljic commented 1 day ago

it appears it worked, thanks!