erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.15k stars 1.13k forks source link

Block production fails - Gnosis Chain #11889

Closed eth2353 closed 2 months ago

eth2353 commented 2 months ago

System information

Erigon version: 3.0.0-alpha3

OS & Version: Linux

Erigon Command (with flags/config): --chain=gnosis --prune.mode=full --externalcl

Consensus Layer: Teku 24.8.0

Consensus Layer Command (with flags/config):

Chain/Network:

Expected behaviour

Validators can produce blocks with Erigon as the EL client.

Actual behaviour

Block production frequently fails with the following error "Execution Service busy, could not fulfil Assemble Block request".

Logs

Erigon:

[INFO] [09-05|16:24:20.305] [NewPayload] Handling new payload        height=35847820 hash=0x3ce3a2e6995ddb5b144594ed8082ba10967b8ff3f0fffbc7f0d92848e1fa8697
[INFO] [09-05|16:24:20.417] head updated                             head=0x3ce3a2e6995ddb5b144594ed8082ba10967b8ff3f0fffbc7f0d92848e1fa8697 hash=0x3ce3a2e6995ddb5b144594ed8082ba10967b8ff3f0fffbc7f0d92848e1fa8697 number=35847820 execution=18.521829ms mgas/s=89.67 average mgas/s=96.66 commit=905.676µs alloc=2.3GB sys=11.3GB
[INFO] [09-05|16:24:25.500] [NewPayload] Handling new payload        height=35847821 hash=0x4f42a012dddd5056016dbac0f9307953d96f4d72bddf6e0a8591b3982a817cd1
[WARN] [09-05|16:24:25.665] [ForkChoiceUpdated] Execution Service busy, could not fulfil Assemble Block request req.parentHash="hi:{hi:5711303280351334486  lo:102943704476580179}  lo:{hi:15667826783852391946  lo:9624671344796990673}"
[INFO] [09-05|16:24:25.690] head updated                             head=0x4f42a012dddd5056016dbac0f9307953d96f4d72bddf6e0a8591b3982a817cd1 hash=0x4f42a012dddd5056016dbac0f9307953d96f4d72bddf6e0a8591b3982a817cd1 number=35847821 execution=17.258843ms mgas/s=54.03 average mgas/s=96.44 commit=14.230996ms alloc=2.4GB sys=11.3GB
[INFO] [09-05|16:24:30.439] [NewPayload] Handling new payload        height=35847822 hash=0x96e3107ef03e664e9ead9e7b936b00a2c53ee4558ccc149ab6b12a93300fa492
[INFO] [09-05|16:24:30.563] head updated                             head=0x96e3107ef03e664e9ead9e7b936b00a2c53ee4558ccc149ab6b12a93300fa492 hash=0x96e3107ef03e664e9ead9e7b936b00a2c53ee4558ccc149ab6b12a93300fa492 number=35847822 execution=29.78927ms mgas/s=457.73 average mgas/s=98.33 commit=921.936µs alloc=2.4GB sys=11.3GB

Teku:

2024-09-05 16:24:22.001 INFO  - Slot Event  *** Slot: 17312024, Block: 17590ed4114e73a03208509358dcfb662928c343382c8f368064c0769e1c3389, Justified: 1082000, Finalized: 1081999, Peers: 99
2024-09-05 16:24:25.663 INFO  - Calling local execution layer to start block production (block slot: 17312026)
2024-09-05 16:24:27.000 INFO  - Slot Event  *** Slot: 17312025, Block: f531245a16ae0ff4976f2cd7dfbc62516476602398d1e0b61af6a7549c797ecb, Justified: 1082000, Finalized: 1081999, Peers: 99
2024-09-05 16:24:30.022 INFO  - Creating unsigned block for slot 17312026
2024-09-05 16:24:30.180 ERROR - Failed to process request to URL http://production-gnosis-serenita-cl-el-server-3.tailad4ab.ts.net:5555/eth/v3/validator/blocks/17312026
java.lang.IllegalStateException: ExecutionPayloadContext is not provided for production of post-merge block at slot 17312026
at tech.pegasys.teku.validator.coordinator.BlockOperationSelectorFactory.setExecutionData(BlockOperationSelectorFactory.java:219) ~[teku-beacon-validator-24.8.0.jar:24.8.0]
at tech.pegasys.teku.validator.coordinator.BlockOperationSelectorFactory.lambda$createSelector$6(BlockOperationSelectorFactory.java:190) ~[teku-beacon-validator-24.8.0.jar:24.8.0]
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source) ~[?:?]
at tech.pegasys.teku.infrastructure.async.SafeFuture.lambda$propagateResult$3(SafeFuture.java:147) ~[teku-infrastructure-async-24.8.0.jar:24.8.0]
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.whenComplete(Unknown Source) ~[?:?]
at tech.pegasys.teku.infrastructure.async.SafeFuture.whenComplete(SafeFuture.java:620) ~[teku-infrastructure-async-24.8.0.jar:24.8.0]
at tech.pegasys.teku.infrastructure.async.SafeFuture.whenComplete(SafeFuture.java:31) ~[teku-infrastructure-async-24.8.0.jar:24.8.0]
at tech.pegasys.teku.infrastructure.async.SafeFuture.propagateResult(SafeFuture.java:142) ~[teku-infrastructure-async-24.8.0.jar:24.8.0]
at tech.pegasys.teku.infrastructure.async.SafeFuture.propagateTo(SafeFuture.java:318) ~[teku-infrastructure-async-24.8.0.jar:24.8.0]
at tech.pegasys.teku.infrastructure.async.ScheduledExecutorAsyncRunner.lambda$createRunnableForAction$1(ScheduledExecutorAsyncRunner.java:124) ~[teku-infrastructure-async-24.8.0.jar:24.8.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.base/java.lang.Thread.run(Unknown Source) [?:?]
2024-09-05 16:24:32.000 INFO  - Slot Event  *** Slot: 17312026, Block: b285c681245df4c1228334092af27b45abde3194f188c1cdc30fe65cdc5e1292, Justified: 1082000, Finalized: 1081999, Peers: 99

This was happening on the same node on alpha2 as well. I don't know if this happens on Ethereum. Today I wiped the datadir and resynced the node from scratch on alpha3 (ottersync) but the issue persists. Happy to provide any more details or test a fix.

If you believe the issue to be on the Teku side I can also try a different CL.

eth2353 commented 2 months ago

Changed the CL to Lodestar, block production works fine with Lodestar as the CL... I'll go ahead and close the issue since it may very well be on the Teku side