Open nflaig opened 1 year ago
It looks like the problem here is that the node is not able to range sync.
There are a lot of beacon_blocks_by_range errors
Apr-03 08:42:45.452[network] [36mverbose[39m: Req error method=beacon_blocks_by_range, encoding=ssz_snappy, client=Unknown, peer=16...tknFJJ, requestId=670 code=REQUEST_ERROR_DIAL_TIMEOUT
Error: REQUEST_ERROR_DIAL_TIMEOUT
at file:///usr/app/packages/reqresp/src/request/index.ts:116:15
at sendRequest (file:///usr/app/packages/reqresp/src/request/index.ts:104:20)
at ReqRespBeaconNode.sendRequest (file:///usr/app/packages/reqresp/src/ReqResp.ts:152:7)
at collectSequentialBlocksInRange (file:///usr/app/packages/beacon-node/src/network/reqresp/utils/collectSequentialBlocksInRange.ts:14:20)
at beaconBlocksMaybeBlobsByRange (file:///usr/app/packages/beacon-node/src/network/reqresp/beaconBlocksMaybeBlobsByRange.ts:36:20)
at wrapError (file:///usr/app/packages/beacon-node/src/util/wrapError.ts:18:32)
at SyncChain.sendBatch (file:///usr/app/packages/beacon-node/src/sync/range/chain.ts:400:19)
Time to first byte timeouts
Apr-03 08:39:30.822[network] [36mverbose[39m: Req error method=beacon_blocks_by_range, encoding=ssz_snappy, client=Lighthouse, peer=16...ZDf5pb, requestId=55 code=REQUEST_ERROR_TTFB_TIMEOUT
Error: REQUEST_ERROR_TTFB_TIMEOUT
at getError (file:///usr/app/packages/reqresp/src/request/index.ts:176:29)
at EventTarget.abortHandler (file:///usr/app/packages/reqresp/src/utils/abortableSource.ts:26:48)
at EventTarget.[nodejs.internal.kHybridDispatch] (node:internal/event_target:735:20)
at EventTarget.dispatchEvent (node:internal/event_target:677:26)
at abortSignal (node:internal/abort_controller:308:10)
at AbortController.abort (node:internal/abort_controller:338:5)
at Timeout.<anonymous> (file:///usr/app/packages/reqresp/src/request/index.ts:162:64)
at listOnTimeout (node:internal/timers:569:17)
at processTimers (node:internal/timers:512:7)
Timeout between <response_chunk>
exceeded
Apr-03 08:39:35.002[sync] [36mverbose[39m: Batch download error id=Finalized, startEpoch=191862, status=Downloading method=beacon_blocks_by_range, encoding=ssz_snappy, peer=16Uiu2HAmVqjEaG7SRVEe7hBmLWeyDaUoN1bSXaYppEJ3D1JeNcAH, code=REQUEST_ERROR_RESP_TIMEOUT
Error: REQUEST_ERROR_RESP_TIMEOUT
at sendRequest (file:///usr/app/packages/reqresp/src/request/index.ts:219:13)
at ReqRespBeaconNode.sendRequest (file:///usr/app/packages/reqresp/src/ReqResp.ts:152:7)
at collectSequentialBlocksInRange (file:///usr/app/packages/beacon-node/src/network/reqresp/utils/collectSequentialBlocksInRange.ts:14:20)
at beaconBlocksMaybeBlobsByRange (file:///usr/app/packages/beacon-node/src/network/reqresp/beaconBlocksMaybeBlobsByRange.ts:36:20)
at wrapError (file:///usr/app/packages/beacon-node/src/util/wrapError.ts:18:32)
at SyncChain.sendBatch (file:///usr/app/packages/beacon-node/src/sync/range/chain.ts:400:19)
Lodestar sents beacon_blocks_by_range requests to nodes where the connection is already being closed
Apr-03 08:42:19.636[network] [36mverbose[39m: Req error method=beacon_blocks_by_range, encoding=ssz_snappy, client=Teku, peer=16...fTwKhq, requestId=659 code=REQUEST_ERROR_DIAL_ERROR, error=the connection is being closed
Error: the connection is being closed
at ConnectionImpl.newStream (file:///usr/app/node_modules/libp2p/src/connection/index.ts:110:21)
at Libp2pNode.dialProtocol (file:///usr/app/node_modules/libp2p/src/libp2p.ts:374:29)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at runNextTicks (node:internal/process/task_queues:64:3)
at listOnTimeout (node:internal/timers:538:9)
at processTimers (node:internal/timers:512:7)
at file:///usr/app/packages/reqresp/src/request/index.ts:107:22
at withTimeout (file:///usr/app/packages/utils/src/timeout.ts:19:12)
at sendRequest (file:///usr/app/packages/reqresp/src/request/index.ts:104:20)
at ReqRespBeaconNode.sendRequest (file:///usr/app/packages/reqresp/src/ReqResp.ts:152:7)
Issue does not seem to be isolated to a specific client
Summary of the beacon_blocks_by_range error logs per client:
Disconnect reasons:
Disconnect reason is predominantly "Client has too many peers"
Why are there so many timeouts
Problem
It has been reported by some users that their Lodestar BN takes up to 40 minutes to get to max peers (50).
Logs
Discord