Closed zilm13 closed 8 months ago
I believe the beacon api provides enough grounds for a requirement that the state is advanced to the empty slot (which is also what nimbus does):
True if the response references the finalized history of the chain, as determined by fork choice.
Fork choice doesn't know about slots, only epochs.
You don't need any block to checkpoint-sync, only a state. Once you have a state, you can apply the next block to it without having the "state block" or "finalized block" available - some clients may have additional requirements but these are local to the implementation / can be removed.
If you also want to download a block (which in the case of an empty finalized epoch slot will be from the previous epoch), you can do so by computing its block root from the latest_block_header
, patching in the state root (which you now know) like so: https://github.com/status-im/nimbus-eth2/blob/unstable/beacon_chain/spec/beaconstate.nim#L1408
You can then match the block to the state (instead of matching the state to the block).
If you also want to download a block (which in the case of an empty finalized epoch slot will be from the previous epoch), you can do so by computing its block root from the
latest_block_header
Yeah, so we need 2 requests instead of one + state patching and only after that it will pass anchor validation. Doesn't it look too much for such a common task?
@zilm13 Don't you already have 2 requests? One for the block and one for the state?
With Jacek's approach (which is what Lighthouse does) you just request the state first and then the block, instead of the other way around. It's also less racy than Lighthouse's previous approach which was to request the finalized block, and then the state matching that block's state root. With our previous approach sometimes the state would end up having been pruned by the time we requested it (if it was no longer the newest finalized state).
With the state patching, you also don't need to actually modify the state, you can copy the block header and patch it
Technically, nimbus' checkpoint sync is in two parts:
notably, the block on the finalized boundary is not needed for anything (as far as the protocol goes) other than p2p requests of historical data so a single request (for the state) is all that's needed to bootstrap a consensus client and start processing blocks from the network.
The way nimbus works, it also tries to backfill blocks via REST on checkpoint sync (all the ~6 months) - the boundary block is just a special case of that backfill but it could just as well be done via p2p (if the user doesn't want REST-based backfill).
fwiw, the REST backfill also addresses a grey area that current clients otherwise exploit simply because there's no enforcement - a client that hasn't backfilled is technically not in compliance with the p2p spec because it's not serving historical blocks but it is (typically) profiting from validator duties already - not a problem as long as leeching is moderate / limited to a small subset of the network but still.
@michaelsproul we use one request and it's not working for states with empty slots transitioned, so yeah, we need 2 requests and patching unless we have API for finalized anchor. I understand that it's doable, just wonder why it should be so hacky for a very common case. Plus depends on debug
API with much broader usage which could be an issue for nodes sharing it (could be misused, easier to DoS)
I understand that it's doable, just wonder why it should be so hacky for a very common case.
Just so we're clear, there is no hack here where a second download is needed - the latest_block_header
approach works equally well for a slot-transitioned state and non-slot-transitioned state (see the linked code - it handles both cases) - in fact, to comply with the API, it would probably be a good idea for teku to provide a slot-transitioned state (per above spec reasoning) - we have a special case for servers like teku to handle non-transitioned states as well, but that's because we're liberal in what we accept in the interest of pragmatism.
Thank you for feedback, @michaelsproul and @arnetheduck
To sum up we will fix current behavior in Teku to provide transitioned state via debug/state/finalized
and update our checkpoint sync start to be compatible with such state.
Cheers!
The issue: We broadly use debug/state endpoint for checkpoint sync in consensus layer clients. The spec defines weak subjectivity period denouncing full sync and it's the only endpoint where the clients could request the latest state for sync. Usually
finalized
id is used, which leads to differences in interpretation in some cases. When a finalized epoch doesn't end with a block and has empty slots after it, some clients transition state to the end of the epoch (Lighthouse) and others don't (Teku). I don't know about other clients, it would be interesting to know other clients' approaches for this case too.Both views look more or less legit in absence of API clarification, moreover Lighthouse approach seems more correct as it's truly the end of finalized epoch. But when we initialize client from the blank, we use anchor block and anchor state, so the block
state_root
should matchhash_tree_root
(See the spec) as a prerequisite for fork choice start. When we've transitioned for empty slots, we change the state, so blockstate_root
(which is actually ZERO afterprocessBlock
) will not matchhash_tree_root
. It could be assumed that it shouldn't break anything, if it doesn't match, but until the check is in the spec and safety analysis on removal is not done, I don't feel safe removing it from the Teku code.To sum up, when you request
debugStateV2
withfinalized
id, you couldn't be sure that the response state will contain a state matching the block inlatest_block_header
.Proposal: We could introduce a separate
id
, something likefinalizedanchor
to get state at the moment of the last block in finalized epoch, or even better we could add a new endpoint dedicated for checkpoint sync. Weak subjectivity period sync is the most common type of sync today, and it's strange that we use debug API to get state for it.debug/state
endpoint could be not very cheap for many of id inputs and doesn't sound as something that should be widely open for broad use, potentially could be overloaded with heavy queries. So a dedicated endpoint for finalized anchor state (at the moment of last block in finalized epoch) inbeacon
scope could be a better alternative.Happy to get some feedback on this.