ObolNetwork / charon

Charon (pronounced 'kharon') is a Proof of Stake Ethereum Distributed Validator Client
https://docs.obol.tech/
Other
194 stars 84 forks source link

Investigate why Lighthouse/Lighthouse combo experiences issues with missing duties #3088

Open boulder225 opened 5 months ago

boulder225 commented 5 months ago

🎯 Problem to be solved

Lighthouse/Lighthouse combo (running with distributed flag), although gets slightly better over time , is experiencing issues with missing duties, specifically:

This could be due to another misconfiguration probably in kurtosis. Also, the beacon node score is low (97%), and VC loads keys extremely slow (15 minutes or longer).

image.png image.png

VC log:

2024-05-17 11:41:02 May 17 08:41:02.002 INFO All validators active                   slot: 120, epoch: 3, total_validators: 600, active_validators: 600, current_epoch_proposers: 11, service: notifier
2024-05-17 11:41:04 May 17 08:41:04.002 DEBG Request to beacon node failed           error: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out), node: http://node0:3600/
2024-05-17 11:41:04 May 17 08:41:04.002 CRIT Failed to produce sync contribution     error: Some endpoints failed, num_failed: 1 http://node0:3600/ => RequestFailed(HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)), beacon_block_root: 0xf39e22a862b1b632a791c716fed6b5196b6dfac88c41ed69185b189f9fdc3a1c, slot: 119, service: sync_committee
2024-05-17 11:41:04 May 17 08:41:04.002 DEBG Request to beacon node failed           error: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out), node: http://node0:3600/
2024-05-17 11:41:04 May 17 08:41:04.002 CRIT Failed to produce sync contribution     error: Some endpoints failed, num_failed: 1 http://node0:3600/ => RequestFailed(HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)), beacon_block_root: 0xf39e22a862b1b632a791c716fed6b5196b6dfac88c41ed69185b189f9fdc3a1c, slot: 119, service: sync_committee
2024-05-17 11:41:04 May 17 08:41:04.002 DEBG Request to beacon node failed           error: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out), node: http://node0:3600/
2024-05-17 11:41:04 May 17 08:41:04.002 CRIT Failed to produce sync contribution     error: Some endpoints failed, num_failed: 1 http://node0:3600/ => RequestFailed(HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)), beacon_block_root: 0xf39e22a862b1b632a791c716fed6b5196b6dfac88c41ed69185b189f9fdc3a1c, slot: 119, service: sync_committee
2024-05-17 11:41:04 May 17 08:41:04.002 DEBG Request to beacon node failed           error: "Failed to produce an aggregate attestation: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)", node: http://node0:3600/
2024-05-17 11:41:04 May 17 08:41:04.002 CRIT Error during attestation routine        slot: 119, committee_index: 0, error: "Some endpoints failed, num_failed: 1 http://node0:3600/ => RequestFailed(\"Failed to produce an aggregate attestation: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)\")", service: attestation
2024-05-17 11:41:04 May 17 08:41:04.002 DEBG Request to beacon node failed           error: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out), node: http://node0:3600/
2024-05-17 11:41:04 May 17 08:41:04.002 CRIT Failed to produce sync contribution     error: Some endpoints failed, num_failed: 1 http://node0:3600/ => RequestFailed(HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)), beacon_block_root: 0xf39e22a862b1b632a791c716fed6b5196b6dfac88c41ed69185b189f9fdc3a1c, slot: 119, service: sync_committee

charon log:

2024-05-17 11:41:40 08:41:40.003 DEBG vapi       Validator api 4xx response               {"status_code": 408, "message": "client cancelled request", "error": "api error[status=408,msg=client cancelled request]: context canceled", "duration": "12.00039563s", "vapi_endpoint": "sync_committee_contribution"}
2024-05-17 11:41:40 08:41:40.003 DEBG vapi       Validator api 4xx response               {"status_code": 408, "message": "client cancelled request", "error": "api error[status=408,msg=client cancelled request]: context canceled", "duration": "12.000778464s", "vapi_endpoint": "sync_committee_contribution"}
2024-05-17 11:41:40 08:41:40.003 DEBG vapi       Validator api 4xx response               {"status_code": 408, "message": "client cancelled request", "error": "api error[status=408,msg=client cancelled request]: context canceled", "duration": "12.000870464s", "vapi_endpoint": "aggregate_attestation"}
2024-05-17 11:41:40 08:41:40.003 DEBG vapi       Validator api 4xx response               {"status_code": 408, "message": "client cancelled request", "error": "api error[status=408,msg=client cancelled request]: context canceled", "duration": "12.001038714s", "vapi_endpoint": "sync_committee_contribution"}
2024-05-17 11:41:44 08:41:44.001 DEBG sched      Slot ticked                              {"slot": 124}
2024-05-17 11:41:44 08:41:44.003 DEBG fetcher    Timeout calling fetcher/fetch, duty expired {"duty": "119/aggregator"}
2024-05-17 11:41:44 08:41:44.004 DEBG fetcher    Timeout calling fetcher/fetch, duty expired {"duty": "119/sync_contribution"}

CL error:

2024-05-17 11:42:35 May 17 08:42:35.060 WARN Relay error when registering validator(s), error: ServerMessage(ErrorMessage { code: 502, message: "no successful relay response", stacktraces: [] }), num_registrations: 600

2024-05-17 11:43:08 May 17 08:43:08.014 ERRO No valid eth1_data votes, `votes_to_consider` empty, outcome: casting `state.eth1_data` as eth1 vote, genesis_time: 1715933816, earliest_block_timestamp: 1715933796, lowest_block_number: 0, service: deposit_contract_rpc

🛠️ Proposed solution

boulder225 commented 5 months ago

Hey team! Please add your planning poker estimate with Zenhub @gsora @KaloyanTanev @pinebit