Proposal for finalized anchor endpoint/id

zilm13 commented 8 months ago

The issue: We broadly use debug/state endpoint for checkpoint sync in consensus layer clients. The spec defines weak subjectivity period denouncing full sync and it's the only endpoint where the clients could request the latest state for sync. Usually finalized id is used, which leads to differences in interpretation in some cases. When a finalized epoch doesn't end with a block and has empty slots after it, some clients transition state to the end of the epoch (Lighthouse) and others don't (Teku). I don't know about other clients, it would be interesting to know other clients' approaches for this case too.

Both views look more or less legit in absence of API clarification, moreover Lighthouse approach seems more correct as it's truly the end of finalized epoch. But when we initialize client from the blank, we use anchor block and anchor state, so the block state_root should match hash_tree_root (See the spec) as a prerequisite for fork choice start. When we've transitioned for empty slots, we change the state, so block state_root (which is actually ZERO after processBlock) will not match hash_tree_root. It could be assumed that it shouldn't break anything, if it doesn't match, but until the check is in the spec and safety analysis on removal is not done, I don't feel safe removing it from the Teku code.

To sum up, when you request debugStateV2 with finalized id, you couldn't be sure that the response state will contain a state matching the block in latest_block_header.

Proposal: We could introduce a separate id, something like finalizedanchor to get state at the moment of the last block in finalized epoch, or even better we could add a new endpoint dedicated for checkpoint sync. Weak subjectivity period sync is the most common type of sync today, and it's strange that we use debug API to get state for it. debug/state endpoint could be not very cheap for many of id inputs and doesn't sound as something that should be widely open for broad use, potentially could be overloaded with heavy queries. So a dedicated endpoint for finalized anchor state (at the moment of last block in finalized epoch) in beacon scope could be a better alternative.

Happy to get some feedback on this.

arnetheduck commented 8 months ago

I believe the beacon api provides enough grounds for a requirement that the state is advanced to the empty slot (which is also what nimbus does):

https://github.com/ethereum/beacon-APIs/blob/d2f846e0e0d551973253df4b5341655eddf92f1d/types/primitive.yaml#L55C17-L55C115

True if the response references the finalized history of the chain, as determined by fork choice.

Fork choice doesn't know about slots, only epochs.

You don't need any block to checkpoint-sync, only a state. Once you have a state, you can apply the next block to it without having the "state block" or "finalized block" available - some clients may have additional requirements but these are local to the implementation / can be removed.

If you also want to download a block (which in the case of an empty finalized epoch slot will be from the previous epoch), you can do so by computing its block root from the latest_block_header, patching in the state root (which you now know) like so: https://github.com/status-im/nimbus-eth2/blob/unstable/beacon_chain/spec/beaconstate.nim#L1408

You can then match the block to the state (instead of matching the state to the block).

zilm13 commented 8 months ago

If you also want to download a block (which in the case of an empty finalized epoch slot will be from the previous epoch), you can do so by computing its block root from the latest_block_header

Yeah, so we need 2 requests instead of one + state patching and only after that it will pass anchor validation. Doesn't it look too much for such a common task?

michaelsproul commented 8 months ago

@zilm13 Don't you already have 2 requests? One for the block and one for the state?

With Jacek's approach (which is what Lighthouse does) you just request the state first and then the block, instead of the other way around. It's also less racy than Lighthouse's previous approach which was to request the finalized block, and then the state matching that block's state root. With our previous approach sometimes the state would end up having been pruned by the time we requested it (if it was no longer the newest finalized state).

michaelsproul commented 8 months ago

With the state patching, you also don't need to actually modify the state, you can copy the block header and patch it

arnetheduck commented 8 months ago

Technically, nimbus' checkpoint sync is in two parts:

download the state - this is needed to operate the beacon node protocol, process blocks etc - it's one request and it's all that's needed to start the client
backfill blocks - this is required to serve historical data per the p2p range request (but nothing else)

notably, the block on the finalized boundary is not needed for anything (as far as the protocol goes) other than p2p requests of historical data so a single request (for the state) is all that's needed to bootstrap a consensus client and start processing blocks from the network.

The way nimbus works, it also tries to backfill blocks via REST on checkpoint sync (all the ~6 months) - the boundary block is just a special case of that backfill but it could just as well be done via p2p (if the user doesn't want REST-based backfill).

fwiw, the REST backfill also addresses a grey area that current clients otherwise exploit simply because there's no enforcement - a client that hasn't backfilled is technically not in compliance with the p2p spec because it's not serving historical blocks but it is (typically) profiting from validator duties already - not a problem as long as leeching is moderate / limited to a small subset of the network but still.

zilm13 commented 8 months ago

@michaelsproul we use one request and it's not working for states with empty slots transitioned, so yeah, we need 2 requests and patching unless we have API for finalized anchor. I understand that it's doable, just wonder why it should be so hacky for a very common case. Plus depends on debug API with much broader usage which could be an issue for nodes sharing it (could be misused, easier to DoS)

arnetheduck commented 8 months ago

I understand that it's doable, just wonder why it should be so hacky for a very common case.

Just so we're clear, there is no hack here where a second download is needed - the latest_block_header approach works equally well for a slot-transitioned state and non-slot-transitioned state (see the linked code - it handles both cases) - in fact, to comply with the API, it would probably be a good idea for teku to provide a slot-transitioned state (per above spec reasoning) - we have a special case for servers like teku to handle non-transitioned states as well, but that's because we're liberal in what we accept in the interest of pragmatism.

zilm13 commented 8 months ago

Thank you for feedback, @michaelsproul and @arnetheduck To sum up we will fix current behavior in Teku to provide transitioned state via debug/state/finalized and update our checkpoint sync start to be compatible with such state. Cheers!

ethereum / beacon-APIs

Proposal for finalized anchor endpoint/id #423