Consensys / teku

Open-source Ethereum consensus client written in Java
https://consensys.io/teku
Apache License 2.0
679 stars 284 forks source link

Teku VC Json block production flow is missing a required header #8753

Closed NeoPlays closed 1 month ago

NeoPlays commented 1 month ago

Description

I'm aware that running different version combination is not best practice, but i think this is still worth mentioning: Running a Teku VC 24.8.0 with a Teku BN 24.10.1 (with docker) leads to failing block proposals:

2024-10-14 07:07:48.529 INFO - Received block for slot 10172137, block rewards 0.046197 ETH, execution payload value 0.239068 ETH
2024-10-14 07:07:48.537 INFO - Block for slot 10172137 was blinded and will only be sent to the beacon node which created it.
2024-10-14 07:07:48.545 ERROR - �[31mValidator *** Failed to produce block Slot: 10172137 Validator: 95802ae�[0m
java.lang.IllegalArgumentException: Invalid params response from Beacon Node API (url = http://<ip>:5051/eth/v2/beacon/blinded_blocks?broadcast_validation=gossip, status = 400, message = Missing required header value for (Eth-Consensus-Version))
at tech.pegasys.teku.validator.remote.typedef.ResponseHandler.defaultBadRequestHandler(ResponseHandler.java:161) ~[teku-validator-remote-24.8.0.jar:24.8.0]
at tech.pegasys.teku.validator.remote.typedef.ResponseHandler.handleResponse(ResponseHandler.java:125) ~[teku-validator-remote-24.8.0.jar:24.8.0]
at tech.pegasys.teku.validator.remote.typedef.handlers.AbstractTypeDefRequest.executeCall(AbstractTypeDefRequest.java:188) ~[teku-validator-remote-24.8.0.jar:24.8.0]
at tech.pegasys.teku.validator.remote.typedef.handlers.AbstractTypeDefRequest.postJson(AbstractTypeDefRequest.java:150) ~[teku-validator-remote-24.8.0.jar:24.8.0]
at tech.pegasys.teku.validator.remote.typedef.handlers.SendSignedBlockRequest.sendSignedBlockAsJson(SendSignedBlockRequest.java:121) ~[teku-validator-remote-24.8.0.jar:24.8.0]
at tech.pegasys.teku.validator.remote.typedef.handlers.SendSignedBlockRequest.sendSignedBlock(SendSignedBlockRequest.java:79) ~[teku-validator-remote-24.8.0.jar:24.8.0]
at tech.pegasys.teku.validator.remote.typedef.OkHttpValidatorTypeDefClient.sendSignedBlock(OkHttpValidatorTypeDefClient.java:141) ~[teku-validator-remote-24.8.0.jar:24.8.0]
at tech.pegasys.teku.validator.remote.RemoteValidatorApiHandler.lambda$sendSignedBlock$11(RemoteValidatorApiHandler.java:266) ~[teku-validator-remote-24.8.0.jar:24.8.0]
at tech.pegasys.teku.infrastructure.async.SafeFuture.of(SafeFuture.java:80) ~[teku-infrastructure-async-24.8.0.jar:24.8.0]
at tech.pegasys.teku.validator.remote.RemoteValidatorApiHandler.sendRequest(RemoteValidatorApiHandler.java:397) ~[teku-validator-remote-24.8.0.jar:24.8.0]
at tech.pegasys.teku.validator.remote.RemoteValidatorApiHandler.lambda$sendRequest$28(RemoteValidatorApiHandler.java:392) ~[teku-validator-remote-24.8.0.jar:24.8.0]
at tech.pegasys.teku.infrastructure.async.SafeFuture.of(SafeFuture.java:72) ~[teku-infrastructure-async-24.8.0.jar:24.8.0]
at tech.pegasys.teku.infrastructure.async.ScheduledExecutorAsyncRunner.lambda$createRunnableForAction$1(ScheduledExecutorAsyncRunner.java:124) ~[teku-infrastructure-async-24.8.0.jar:24.8.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.base/java.lang.Thread.run(Unknown Source) [?:?]

With 500 Keys on this Teku VC instance, we missed 9 proposals over the last weekend.

lucassaldanha commented 1 month ago

I'm aware that running different version combinations is not best practice, but i think this is still worth mentioning

You are right! We do not recommend running different versions. All our testing is done on "parallel" versions so it is risky.

Thanks for bringing this to our attention. Will take a look to make sure it isn't related to a bug somewhere.

May I ask why you did not update your VC and only BN?

NeoPlays commented 1 month ago

May I ask why you did not update your VC and only BN?

So we're running Nodes on a bigger scale, therefore we're seperating VCs from Fullnodes. I was working on this Fullnode and decided to also run updates. As we're running fallbacks as well, i thought that if there was an issue that i couldn't spot immediately, the fallbacks would take care of it. But in this case it's a bit more complex than this.

Updating VCs with doppelgägner protection enabled also results in longer downtime.

rolfyone commented 1 month ago

Hey @NeoPlays, just curious - do you have --beacon-node-ssz-blocks-enabled=false in your configuration for your VC?

rolfyone commented 1 month ago

I've done some root cause checking and posted a fix etc.

Teku VC can produce SSZ blocks (default) and submit in an api compliant way to any BN, but its JSON blocks are not beacon-api compliant due to a missing required header.

This was the VC behaviour for ever basically, so any teku VC that connects to a teku 24.10 BN is going to currently fail producing a block from their JSON payload, but the default payload is SSZ, so the issue is relatively contained.

In terms of the teku VC, it will decide to use JSON instead of default if it receives an 'unsupported media type' response when attempting to submit SSZ (415). This should be unusual. The other most likely time is when the configuration specifically sets beacon-node-ssz-blocks-enabled=false. It would be recommended to not set to false currently on your VC.

Teku 24.10 BN will be compatible with SSZ blocks, but not JSON blocks produced from Teku 24.10.2 or earlier VC. The acceptance test surface has been broadened to include a standalone VC producing JSON blocks so that this problem is visible if it should happen again.

This will affect teku VC users specifying --beacon-node-ssz-blocks-enabled=false or if an SSZ block production flow fails and the VC decides it needs to fall back to a JSON payload.

rolfyone commented 1 month ago
tested more: BN VC Json Blocks SSZ Blocks
24.8 24.8
24.8 24.10
24.10 24.8
24.10 24.10

The block production BN changed in #8522 (22 august) ultimately. 24.8.0 release was (8 aug) so this does track with the problem description.

Basically this means that 24.8 BN did not require Eth-Consensus-Version, and 24.10 BN does.

One potential good solution would be to not require that header so that its more compatible. This is technically off spec, but we'll see if we want to make that change given the above table.

It's possibly cleaner to fix the VC for Json and acknowledge the missed breaking change, and be consistent with the specification.

NeoPlays commented 1 month ago

Hey @NeoPlays, just curious - do you have --beacon-node-ssz-blocks-enabled=false in your configuration for your VC?

sorry for the late reply, for the sake of completeness: yes we were running --beacon-node-ssz-blocks-enabled=false. thank you for this in depth analysis. 🙏🏼

rolfyone commented 1 month ago

for the sake of completeness: yes we were running --beacon-node-ssz-blocks-enabled=false. thank you for this in depth analysis. 🙏🏼

perfect - so as a work around you could use SSZ (default) instead and it would work (probably clear from above, just explicitly stating)