Closed praetoriansentry closed 1 week ago
The issue could be be mitigated by removing flag zkevm.l2-datastreamer-timeout
from the config or setting it to a non-zero value, e.g.1s
.
However, RPC node will catch up very slowly when it is behind. It seems like the node was stuck in getting the highest block from datastream.
https://github.com/0xPolygonHermez/cdk-erigon/pull/1424 addresses the slow sync issue.
The slow sync issue seems to be only happening in Normalcy mode, where the verified batch is always 0. As a result, if we have downloaded batches beyond the verified batch, we won't execute them immediately, but only the next batch. (see short circuit log here).
@hexoscott / @V-Staykov do you see any reason why we don't want to execute all the downloaded immediately, considering the RPC can already detect reorg and unwind automatically?
Hey @cffls. To answer the question on the short circuit this is a choice made to sanity check what the network has downloaded. For example when we boot up an RPC node we check the L1 for the latest verified batch, we can then execute all of those blocks and do a single state root check, if it matches the verification we're good. Beyond that point though we're in the wild west so we only process one batch at a time and verify the state root from the datastream, which of course will slow syncing down.
In the case of pesimistic proofs (not sure if this is normalcy or not, I lose track of names for everything as they have a habit of changing) this doesn't make sense because there aren't really any batches or verifications to work from. In this case it makes sense for short circuit code to just say "go ahead, all is fine" and let the node sync and execute everything.
The flags mentioned above will only affect the sequencer DS host, rather than the client. Some calls the client makes force a disconnect from the server for some reason (or that's behaviour we've seen) so we'll need to work around that from the looks of things.
Thanks @hexoscott !
A follow up question regarding this:
Beyond that point though we're in the wild west so we only process one batch at a time and verify the state root from the datastream, which of course will slow syncing down.
The rollback will be automatic if a block hash mismatches. Why not just always sync to the latest downloaded batch for both FEP and PP?
It's about trusting the DS really, we receive the data from effectively an unknown source and without verification on the L1 we check batch by batch that the state root matches the expected. If it doesn't we panic rather than have the RPC serve incorrect data.
I think it makes sense to have a mode of operation for PP to ask the node to not care about this process and keep them separate from normal zkevm/CDK networks.
@cffls working on PR for for PP to add to Beta 6 @hexoscott will close this issue post beta 6 release and validation Monday am European time.
The code is available from beta6 onwards.
I'm having connectivity issues between the RPC and the Sequencer over the datastream port using
v2.60.0-beta5
.It appears that the rpc is continually resetting the connection to the datastream. The logs from the RPC are showing: