ethpandaops / ethereum-package

A Kurtosis package that deploys a private, portable, and modular Ethereum devnet
MIT License
201 stars 106 forks source link

Lighthouse fails with 503 using default args #365

Closed mieubrisse closed 8 months ago

mieubrisse commented 8 months ago

I just ran on local Docker the package with default args, and got a failure waiting for Lighthouse to become available:

$ kurtosis run github.com/kurtosis-tech/ethereum-package
...........
> print msg="Successfully added 1 EL participants"
Successfully added 1 EL participants

> print msg="Launching CL network"
Launching CL network

> add_service name="cl-1-lighthouse-geth" config=ServiceConfig(image="sigp/lighthouse:latest", ports={"http": PortSpec(number=4000, transport_protocol="TCP", application_protocol="http"), "metrics": PortSpec(number=5054, transport_protocol="TCP", application_protocol="http"), "tcp-discovery": PortSpec(number=9000, transport_protocol="TCP", application_protocol=""), "udp-discovery": PortSpec(number=9000, transport_protocol="UDP", application_protocol="")}, files={"/data": "el-cl-genesis-data"}, cmd=["lighthouse", "beacon_node", "--debug-level=info", "--datadir=/consensus-data", "--testnet-dir=/data/data/custom_config_data", "--disable-enr-auto-update", "--enr-address=KURTOSIS_IP_ADDR_PLACEHOLDER", "--enr-udp-port=9000", "--enr-tcp-port=9000", "--listen-address=0.0.0.0", "--port=9000", "--http", "--http-address=0.0.0.0", "--http-port=4000", "--http-allow-sync-stalled", "--slots-per-restore-point=32", "--disable-packet-filter", "--execution-endpoints=http://{{kurtosis:278e7eb6f9574907ab09e728acd380ee:ip_address.runtime_value}}:8551", "--jwt-secrets=/data/data/jwt/jwtsecret", "--suggested-fee-recipient=0x8943545177806ED17B9F23F0a21ee5948eCaa776", "--subscribe-all-subnets", "--metrics", "--metrics-address=0.0.0.0", "--metrics-allow-origin=*", "--metrics-port=5054"], env_vars={"RUST_BACKTRACE": "full"}, private_ip_address_placeholder="KURTOSIS_IP_ADDR_PLACEHOLDER", max_cpu=1000, min_cpu=50, max_memory=1024, min_memory=256, ready_conditions=ReadyCondition(recipe=GetHttpRequestRecipe(port_id="http", endpoint="/eth/v1/node/health"), field="code", assertion="IN", target_value=[200, 206], timeout="15m"), labels={"ethereum-package-client": "lighthouse", "ethereum-package-client-image": "sigp-lighthouse-latest", "ethereum-package-client-type": "beacon", "ethereum-package-connected-client": "geth"})
There was an error executing Starlark code
An error occurred executing instruction (number 21) at github.com/kurtosis-tech/ethereum-package/src/cl/lighthouse/lighthouse_launcher.star[149:38]:
  add_service(name="cl-1-lighthouse-geth", config=ServiceConfig(image="sigp/lighthouse:latest", ports={"http": PortSpec(number=4000, transport_protocol="TCP", application_protocol="http"), "metrics": PortSpec(number=5054, transport_protocol="TCP", application_protocol="http"), "tcp-discovery": PortSpec(number=9000, transport_protocol="TCP", application_protocol=""), "udp-discovery": PortSpec(number=9000, transport_protocol="UDP", application_protocol="")}, files={"/data": "el-cl-genesis-data"}, cmd=["lighthouse", "beacon_node", "--debug-level=info", "--datadir=/consensus-data", "--testnet-dir=/data/data/custom_config_data", "--disable-enr-auto-update", "--enr-address=KURTOSIS_IP_ADDR_PLACEHOLDER", "--enr-udp-port=9000", "--enr-tcp-port=9000", "--listen-address=0.0.0.0", "--port=9000", "--http", "--http-address=0.0.0.0", "--http-port=4000", "--http-allow-sync-stalled", "--slots-per-restore-point=32", "--disable-packet-filter", "--execution-endpoints=http://{{kurtosis:278e7eb6f9574907ab09e728acd380ee:ip_address.runtime_value}}:8551", "--jwt-secrets=/data/data/jwt/jwtsecret", "--suggested-fee-recipient=0x8943545177806ED17B9F23F0a21ee5948eCaa776", "--subscribe-all-subnets", "--metrics", "--metrics-address=0.0.0.0", "--metrics-allow-origin=*", "--metrics-port=5054"], env_vars={"RUST_BACKTRACE": "full"}, private_ip_address_placeholder="KURTOSIS_IP_ADDR_PLACEHOLDER", max_cpu=1000, min_cpu=50, max_memory=1024, min_memory=256, ready_conditions=ReadyCondition(recipe=GetHttpRequestRecipe(port_id="http", endpoint="/eth/v1/node/health"), field="code", assertion="IN", target_value=[200, 206], timeout="15m"), labels={"ethereum-package-client": "lighthouse", "ethereum-package-client-image": "sigp-lighthouse-latest", "ethereum-package-client-type": "beacon", "ethereum-package-connected-client": "geth"}))
  Caused by: An error occurred while checking if service 'cl-1-lighthouse-geth' is ready
  Caused by: An error occurred checking if service 'cl-1-lighthouse-geth' is ready, using recipe 'GetHttpRequestRecipe(port_id="http", endpoint="/eth/v1/node/health")', value field 'code', assertion 'IN', target '[200, 206]', interval '1s' and time-out '15m0s'.
  Caused by: Recipe execution timed-out waiting for the recipe execution to become valid on service 'cl-1-lighthouse-geth'. Tried '900' times. Last assertion execution error was:
   '$Verification failed '503' 'IN' '[200, 206]'
   --- at /home/circleci/project/core/server/api_container/server/startosis_engine/kurtosis_instruction/verify/verify.go:195 (Verify) ---'

Error encountered running Starlark code.

Made with Kurtosis - https://kurtosis.com
INFO[2023-11-19T14:06:34-03:00] =======================================================
INFO[2023-11-19T14:06:34-03:00] ||          Created enclave: dazzling-glade          ||
INFO[2023-11-19T14:06:34-03:00] =======================================================
Name:            dazzling-glade
UUID:            ec9fd3e52394
Status:          RUNNING
Creation Time:   Sun, 19 Nov 2023 13:43:39 -03
Flags:

========================================= Files Artifacts =========================================
UUID           Name
23a615a54e9f   1-lighthouse-geth-0-63
ad4b58e0fbcf   el-cl-genesis-data
fb2c4d1a89a1   final-genesis-timestamp
63e45701da66   genesis-el-cl-env-file
8925de5d2779   genesis-validators-root
c7ed4e4269f3   prysm-password
ffa0a4f82ac1   task-95ac789a-9d2b-44f8-93fa-315c6c91e784-python-script
cf99d839c429   task-96b80c76-3c37-4adf-9a5c-370b768689d9-python-script

========================================== User Services ==========================================
UUID           Name                                             Ports                                         Status
cffa38e18561   cl-1-lighthouse-geth                             http: 4000/tcp -> http://127.0.0.1:51837      RUNNING
                                                                metrics: 5054/tcp -> http://127.0.0.1:51838
                                                                tcp-discovery: 9000/tcp -> 127.0.0.1:51839
                                                                udp-discovery: 9000/udp -> 127.0.0.1:55036
8dfdc35a7bc4   el-1-geth-lighthouse                             engine-rpc: 8551/tcp -> 127.0.0.1:51832       RUNNING
                                                                metrics: 9001/tcp -> 127.0.0.1:51829
                                                                rpc: 8545/tcp -> 127.0.0.1:51831
                                                                tcp-discovery: 30303/tcp -> 127.0.0.1:51830
                                                                udp-discovery: 30303/udp -> 127.0.0.1:56316
2d3cde50bc80   task-4608296c-79f1-4403-b510-de89d8c51f75        <none>                                        RUNNING
4335610adf2b   task-95ac789a-9d2b-44f8-93fa-315c6c91e784        <none>                                        RUNNING
74b69c9ea15f   task-96b80c76-3c37-4adf-9a5c-370b768689d9        <none>                                        RUNNING
a5fa143705ef   validator-key-generation-cl-validator-keystore   <none>                                        RUNNING

The Lighthouse client seems to be unhappy for some reason:

2023-11-19 14:09:34 Nov 19 17:09:34.371 ERRO Error updating deposit contract cache   error: Failed to get remote head and new block ranges: EndpointError(FarBehind), retry_millis: 60000, service: deposit_contract_rpc
2023-11-19 14:09:40 Nov 19 17:09:40.004 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:09:40 Nov 19 17:09:40.004 INFO Searching for peers                     current_slot: 118, head_slot: 0, finalized_epoch: 0, finalized_root: 0x024d…7117, peers: 0, service: slot_notifier
2023-11-19 14:09:40 Nov 19 17:09:40.004 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:09:52 Nov 19 17:09:52.006 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:09:52 Nov 19 17:09:52.006 INFO Searching for peers                     current_slot: 119, head_slot: 0, finalized_epoch: 0, finalized_root: 0x024d…7117, peers: 0, service: slot_notifier
2023-11-19 14:09:52 Nov 19 17:09:52.006 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:10:04 Nov 19 17:10:04.002 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:10:04 Nov 19 17:10:04.002 INFO Searching for peers                     current_slot: 120, head_slot: 0, finalized_epoch: 0, finalized_root: 0x024d…7117, peers: 0, service: slot_notifier
2023-11-19 14:10:04 Nov 19 17:10:04.002 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:10:16 Nov 19 17:10:16.002 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:10:16 Nov 19 17:10:16.003 INFO Searching for peers                     current_slot: 121, head_slot: 0, finalized_epoch: 0, finalized_root: 0x024d…7117, peers: 0, service: slot_notifier
2023-11-19 14:10:16 Nov 19 17:10:16.003 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:10:28 Nov 19 17:10:28.002 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:10:28 Nov 19 17:10:28.002 INFO Searching for peers                     current_slot: 122, head_slot: 0, finalized_epoch: 0, finalized_root: 0x024d…7117, peers: 0, service: slot_notifier
2023-11-19 14:10:28 Nov 19 17:10:28.002 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:10:34 Nov 19 17:10:34.368 WARN Execution endpoint is not synced        last_seen_block_unix_timestamp: 1700412238, endpoint: http://172.16.0.8:8551/, auth=true, service: deposit_contract_rpc
2023-11-19 14:10:34 Nov 19 17:10:34.368 ERRO Error updating deposit contract cache   error: Failed to get remote head and new block ranges: EndpointError(FarBehind), retry_millis: 60000, service: deposit_contract_rpc
2023-11-19 14:10:40 Nov 19 17:10:40.004 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:10:40 Nov 19 17:10:40.005 INFO Searching for peers                     current_slot: 123, head_slot: 0, finalized_epoch: 0, finalized_root: 0x024d…7117, peers: 0, service: slot_notifier
2023-11-19 14:10:40 Nov 19 17:10:40.005 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:10:52 Nov 19 17:10:52.000 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:10:52 Nov 19 17:10:52.001 INFO Searching for peers                     current_slot: 124, head_slot: 0, finalized_epoch: 0, finalized_root: 0x024d…7117, peers: 0, service: slot_notifier
2023-11-19 14:10:52 Nov 19 17:10:52.001 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:11:04 Nov 19 17:11:04.005 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:11:04 Nov 19 17:11:04.005 INFO Searching for peers                     current_slot: 125, head_slot: 0, finalized_epoch: 0, finalized_root: 0x024d…7117, peers: 0, service: slot_notifier
2023-11-19 14:11:04 Nov 19 17:11:04.005 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:11:16 Nov 19 17:11:16.001 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:11:16 Nov 19 17:11:16.002 INFO Searching for peers                     current_slot: 126, head_slot: 0, finalized_epoch: 0, finalized_root: 0x024d…7117, peers: 0, service: slot_notifier
2023-11-19 14:11:16 Nov 19 17:11:16.002 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:11:28 Nov 19 17:11:28.002 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:11:28 Nov 19 17:11:28.002 INFO Searching for peers                     current_slot: 127, head_slot: 0, finalized_epoch: 0, finalized_root: 0x024d…7117, peers: 0, service: slot_notifier
2023-11-19 14:11:28 Nov 19 17:11:28.002 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:11:34 Nov 19 17:11:34.372 WARN Execution endpoint is not synced        last_seen_block_unix_timestamp: 1700412238, endpoint: http://172.16.0.8:8551/, auth=true, service: deposit_contract_rpc
2023-11-19 14:11:34 Nov 19 17:11:34.373 ERRO Error updating deposit contract cache   error: Failed to get remote head and new block ranges: EndpointError(FarBehind), retry_millis: 60000, service: deposit_contract_rpc
2023-11-19 14:11:40 Nov 19 17:11:40.004 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:11:40 Nov 19 17:11:40.004 INFO Searching for peers                     current_slot: 128, head_slot: 0, finalized_epoch: 0, finalized_root: 0x024d…7117, peers: 0, service: slot_notifier
2023-11-19 14:11:40 Nov 19 17:11:40.004 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier

Full dump of the enclave:

dazzling-glade.zip

mieubrisse commented 8 months ago

Just tried again and it seems to be consistent; something's messed up with Lighthouse:

2023-11-19 14:15:00 Nov 19 17:15:00.961 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 276.625µs
2023-11-19 14:15:01 Nov 19 17:15:01.935 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 216.916µs
2023-11-19 14:15:02 Nov 19 17:15:02.947 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 215.875µs
2023-11-19 14:15:03 Nov 19 17:15:03.954 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 222µs
2023-11-19 14:15:04 Nov 19 17:15:04.954 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 201.167µs
2023-11-19 14:15:05 Nov 19 17:15:05.962 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 253µs
2023-11-19 14:15:06 Nov 19 17:15:06.932 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 214.417µs
2023-11-19 14:15:07 Nov 19 17:15:07.003 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:15:07 Nov 19 17:15:07.003 INFO Searching for peers                     current_slot: 2, head_slot: 0, finalized_epoch: 0, finalized_root: 0x109c…f8e4, peers: 0, service: slot_notifier
2023-11-19 14:15:07 Nov 19 17:15:07.003 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:15:07 Nov 19 17:15:07.947 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 218.458µs
2023-11-19 14:15:08 Nov 19 17:15:08.946 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 190.75µs
2023-11-19 14:15:09 Nov 19 17:15:09.953 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 213.625µs
2023-11-19 14:15:10 Nov 19 17:15:10.954 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 221.209µs
2023-11-19 14:15:11 Nov 19 17:15:11.958 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 195.375µs
2023-11-19 14:15:12 Nov 19 17:15:12.952 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 204.583µs
2023-11-19 14:15:13 Nov 19 17:15:13.964 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 249.209µs
2023-11-19 14:15:14 Nov 19 17:15:14.960 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 246.917µs
2023-11-19 14:15:15 Nov 19 17:15:15.951 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 201µs
2023-11-19 14:15:16 Nov 19 17:15:16.953 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 222.459µs
2023-11-19 14:15:17 Nov 19 17:15:17.950 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 224.5µs
2023-11-19 14:15:18 Nov 19 17:15:18.958 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 252.916µs
2023-11-19 14:15:19 Nov 19 17:15:19.001 WARN Low peer count                          peer_count: 0, service: slot_notifier
2023-11-19 14:15:19 Nov 19 17:15:19.001 INFO Searching for peers                     current_slot: 3, head_slot: 0, finalized_epoch: 0, finalized_root: 0x109c…f8e4, peers: 0, service: slot_notifier
2023-11-19 14:15:19 Nov 19 17:15:19.001 WARN Not ready for merge                     info: The execution endpoint is connected and configured, however it is not yet synced, service: slot_notifier
2023-11-19 14:15:19 Nov 19 17:15:19.933 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 247.417µs
2023-11-19 14:15:20 Nov 19 17:15:20.937 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 287.042µs
2023-11-19 14:15:21 Nov 19 17:15:21.945 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 228.166µs
2023-11-19 14:15:22 Nov 19 17:15:22.944 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 187.875µs
2023-11-19 14:15:23 Nov 19 17:15:23.940 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 248.083µs
2023-11-19 14:15:24 Nov 19 17:15:24.942 WARN Error processing HTTP API request       method: GET, path: /eth/v1/node/health, status: 503 Service Unavailable, elapsed: 204.875µs

It hasn't failed yet, but it looks like it's going to after the 15-minute (!!) timeout

wandering-knoll.zip

barnabasbusa commented 8 months ago

Pls rerun with --image-download always and confirm it still fails. Most likely you just have an older lh version on your local.

mieubrisse commented 8 months ago

That was it - great shout!