Open hexoscott opened 3 days ago
Detail logs: cdk-error.log
The if err := r.connectDatastream(); err != nil
was been deleted, so we need to retry again while got an disconnect?
Just to add another thought to this discussion, in the implementation of StreamClient, whenever you read from the TCP connection (readBuffer), I noticed you set the read deadline (SetReadDeadline) for the connection. The issues we have seeing shows the error (i/o timeout).
Before you start reading full blocks, net.Dial
is successful, meaning there is already a successful TCP connection to begin with, thereby the data stream client is successfully constructed. Is is possible the stream client may have prematurely shut down its TCP connection during the read process, thereby no further entries are written to entryChan
(causing batch process loop to stall)?
Hi @Vui-Chee - this has been a long journey on the stream client. Certain calls to the datastream host will terminate the connection unexpectedly and we get occasional drops due to inactivity etc. We're continuing to investigate
Regarding the inactivity, check you have set a value for InactivityTimeout
and InactivityCheckInterval
https://github.com/0xPolygon/zkevm-data-streamer/blob/main/datastreamer/config.go#L18
Regarding the inactivity, check you have set a value for
InactivityTimeout
andInactivityCheckInterval
Our data-streamer are on v0.2.3-RC4 So we do not have such configuration yet
There is a fix inbound for this, just going through CI now
Ref: #1492
Ref: #1492
Cool, many thanks, I will try this fix.
During syncing we see the batches stage getting stuck and beyond that it makes no progress. The pattern appears as the image below:
It looks as the though the stream client is missing some cleanup code at either the start of a new stage or at the end before the stage is completed.