Closed barnabasbusa closed 5 months ago
Nimbus reports doppelganger detection:
NTC 2024-04-05 14:40:50.011+00:00 Doppelganger detection active - skipping validator duties while observing the network topics="val_pool" validator=867e8956 slot=107 doppelCheck=none() activationEpoch=0
INF 2024-04-05 14:40:54.000+00:00 Slot start slot=108 epoch=3 attestationIn=4s blockIn=1m validators=64 node_status=synced delay=171us45ns
Is this just the doppelganger? That's intentional. Does --doppelganger-detection=off
work?
Nimbus doesn't use doppelganger protection on sync committees because they're not slashable.
@tersec regarding the getAggregate error, also reported in https://github.com/ChainSafe/lodestar/issues/6631, is Nimbus by any chance requesting an aggregate attestation even if no validator is aggregator? We had the same issue with Lighthouse previously and it was fixed on their end https://github.com/sigp/lighthouse/issues/4712
See https://github.com/ChainSafe/lodestar/issues/6634#issuecomment-2042672443
even with --doppelganger-detection=off
I'm getting:
ERR 2024-04-05 18:03:31.097+00:00 Beacon node provides unexpected response reason="Serialization error;200;produceBlockV3(best);unexpected-data" node=http://172.16.0.31:8562[Lodestar/v1.17.0/f2ec0d4] node_index=0 node_roles=AGBSDT
WRN 2024-04-05 18:03:31.097+00:00 Unable to retrieve block data reason="Serialization error;200;produceBlockV3(best);unexpected-data" wall_slot=28 validator_index=78 service=block_service slot=28 validator=8890da28@78
even with
--doppelganger-detection=off
I'm getting:ERR 2024-04-05 18:03:31.097+00:00 Beacon node provides unexpected response reason="Serialization error;200;produceBlockV3(best);unexpected-data" node=http://172.16.0.31:8562[Lodestar/v1.17.0/f2ec0d4] node_index=0 node_roles=AGBSDT WRN 2024-04-05 18:03:31.097+00:00 Unable to retrieve block data reason="Serialization error;200;produceBlockV3(best);unexpected-data" wall_slot=28 validator_index=78 service=block_service slot=28 validator=8890da28@78
The "unexpected data" might be because we are returning execution_payload_source
metadata field in the response as we allow to produce a blinded local block but this is field is not officially part of the block v3 api and was only proposed in https://github.com/ethereum/beacon-APIs/issues/387 but not included.
See https://github.com/ChainSafe/lodestar/issues/6634#issuecomment-2042600225
The "unexpected data" might be because we are returning execution_payload_source metadata field in the response
After reviewing how Nimbus handles this case it does not seem to be the issue as it only uses metadata provided in headers which Lodestar is setting and only parses the response.data
beacon_chain/validator_client/api.nim#L2145
res = decodeBytes(ProduceBlockResponseV3, response.data, response.contentType, version, blinded, executionValue, consensusValue)
And even there they seem to allow unknown fields
beacon_chain/spec/eth2_apis/eth2_rest_serialization.nim#L3950-L3952
ok(RestJson.decode(value, T, requireAllFields = true, allowUnknownFields = true))
Based on just reviewing Lodestar and Nimbus code it's hard to tell what's the actual issue, would need to inspect the data we are sending the further investigate this
@tersec regarding the getAggregate error, also reported in #6631, is Nimbus by any chance requesting an aggregate attestation even if no validator is aggregator? We had the same issue with Lighthouse previously and it was fixed on their end sigp/lighthouse#4712
This is not the case, Nimbus will only request an aggregated attestation if at least one validator is aggregator
beacon_chain/validator_client/attestation_service.nim#L280
if len(aggregateItems) > 0:
But I think I know what the issue here is, Nimbus uses different strategies per API, and if I interpreted them correctly this is why we are seeing the error:
The submitPoolAttestations
is only sent to the first / primary beacon node (unless an error is encountered)
beacon_chain/validator_client/attestation_service.nim#L66
await vc.submitPoolAttestations(@[attestation], ApiStrategyKind.First)
While the getAggregatedAttestation
is sent to all connected beacon nodes and the "best" response is picked
beacon_chain/validator_client/attestation_service.nim#L283-L284
await vc.getAggregatedAttestation(slot, attestationRoot, ApiStrategyKind.Best)
But Lodestar might not have an aggregated attestation in it's cache if the validator client did not previously submit an attestation for attestation_data_root
. Although this error might be not happen consistently as the beacon node might prepare the aggregate due to receiving the attestation via gossip (gossipHandlers.ts#L490-L492).
We have just improved the error handling if there is no aggregated attestation available https://github.com/ChainSafe/lodestar/pull/6648 to follow spec and produce a less nosiy error as this is somewhat expected to happen in a setup with multiple nodes.
If my analysis is correct, this is also not an issue as Nimbus would still produce the aggregate attestation if it received the data from the primary node.
@barnabasbusa this should not happen in a 1:1 setup without fallback node, I am assuming in your tests you had multiple bns connected to a single vc
@nflaig We have 1:1 setups only. We haven't considered any testing for 1:x testing.
@nflaig We have 1:1 setups only. We haven't considered any testing for 1:x testing.
Then I have no idea why this happens but something to note is that there is also a very strange error which should never happen
Apr-05 14:20:00.472[rest] error: Req req-r produceBlockV2 error - REGEN_ERROR_SLOT_BEFORE_BLOCK_SLOT
Error: REGEN_ERROR_SLOT_BEFORE_BLOCK_SLOT
So maybe there was just a clock issue / time skew between the validator client and beacon node?
All tests are ran on the same (docker backend) machine, and everyone should be able to replicate it with the config I have posted above. Doubt that there would be any sort of clock issues between two containers hosted on the same physical node.
and everyone should be able to replicate it with the config I have posted above.
Will have to do this next, thanks for the detailed infos
Apr-05 14:20:26.014[rest] error: Req req-5p getAggregatedAttestation error - No attestation for slot=5 dataRoot=0xd947df35713911cac72a977556a795d217eefc7a66bfded54347e18469c675dd Error: No attestation for slot=5 dataRoot=0xd947df35713911cac72a977556a795d217eefc7a66bfded54347e18469c675dd
The aggregated attestation issue has been fixed by https://github.com/ChainSafe/lodestar/pull/6668. There still seems to be strange behavior by Nimbus VC that it requests an aggregated attestation for the first 1-5 slots even though it does not submit subnet subscriptions with is_aggregator=true
for those slots beforehand. But other than that looks good, attestations, aggregates, and sync committee works as expected.
The only remaining issue is block production, my best guess right now is that Nimbus does not like that Lodestar returns a JSON payload from produceBlockV3
API.
The only remaining issue is block production, my best guess right now is that Nimbus does not like that Lodestar returns a JSON payload from produceBlockV3 API.
Confirmed this issue is related us return a JSON payload in response
ERR 2024-05-29 13:37:55.640+00:00 Beacon node provides unexpected response reason="Serialization error;200;produceBlockV3(best);unexpected-data" node=http://172.16.0.15:4000[Lodestar/v1.18.1/8b6ecc4] node_index=0 node_roles=AGBSDT
WRN 2024-05-29 13:37:55.645+00:00 Unable to retrieve block data reason="Serialization error;200;produceBlockV3(best);unexpected-data" wall_slot=12 validator_index=53 service=block_service slot=12 validator=b09cb155@53
Block production with Nimbus VC will be fixed once we merge https://github.com/ChainSafe/lodestar/pull/6749 but there is another issue publishing blocks as SSZ (while JSON seems work fine there). Does not seem to be a problem with other clients and there is an issue on the Nimbus side to track this https://github.com/status-im/nimbus-eth2/issues/6205 <-- this is an issue on Nimbus BN
Describe the bug
We are in the process to test cross beacon <> validator client compatibility, and found a bug when testing lodestar CL with Nimbus VC.
Nimbus reports:
Lodestar reports:
Lodestar seems to be able to submit sync committee signatures, but not block proposals
Snooper between cl <> vc reports:
Expected behavior
I would expect all client combinations would work.
Steps to reproduce
config.yaml:
kurtosis run github.com/kurtosis-tech/ethereum-package --args-file config.yaml
Additional context
Current BN <> VC Compatibility list tracker
Operating system
Linux
Lodestar version or commit hash
Version: v1.17.0/898cd90
cc: @pk910