ethereum / consensus-specs

Ethereum Proof-of-Stake Consensus Specifications
Creative Commons Zero v1.0 Universal
3.56k stars 970 forks source link

EIP-4844: Remove BlobsSidecarsByRange Req-Resp #3087

Open Inphi opened 1 year ago

Inphi commented 1 year ago

The original usecase for BlobsSidecarByRange RPC was to allow nodes sync blobs that missed the initial broadcast. We no longer need to do this with Coupled Beacon Blocks. Historical sync can continue to sync (eip-4844) blocks as it works today because they now contain blob sidecars.

There is one other use for BlobsSidecarByRange which is so light clients can retrieve blobs sidecars only that are older than MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS. This usecase is pretty niche though and such clients can always use the blocks RPC, even if that means potentially receiving dup blocks. Hence, I propose that we remove the RPC altogether and keep the spec minimal.

@terencechain

terencechain commented 1 year ago

We still need a way to historical sync or backtrack sync "missing" blobs with a weak subjectivity state. Are you implying that we add a BeaconBlockAndBlobsSidecarByRange RPC?

tbenr commented 1 year ago

We could make BeaconBlockAndBlobsSidecarByRange to return a ZERO BlobsSidecar if range includes slots prior to MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS, Instead of failing the request and retry with BeaconBlocksByRange.

Inphi commented 1 year ago

Ah I see. I misread the spec, It was my understanding that the BeaconBlocksByRange RPC would return coupled beacon block after the upgrade. But also we should still consider removing the RPC and like you say add a new RPC that returns both.

realbigsean commented 1 year ago

We could make BeaconBlockAndBlobsSidecarByRange to return a ZERO BlobsSidecar if range includes slots prior to MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS, Instead of failing the request and retry with BeaconBlocksByRange.

We've separately been coming to a similar conclusion in lighthouse.

I think we were initially planning to have a separate BeaconBlockByRange and BlobsSidecarByRange request. But @divagant-martian made the point that if we only ever want a block by itself or a block and block together, we should structure our requests to reflect that, both a BeaconBlockByRange and a BeaconBlockAndBlobsSidecarByRange. One for use within the range we're required to have blobs and one for use outside of that range. Making a BlobsSidecarByRange doesn't really make any sense if there are no scenarios where we want just a blob.

terencechain commented 1 year ago

@realbigsean I agree. I think we should just have BeaconBlockByRange and BeaconBlockAndBlobsSidecarByRange

protolambda commented 1 year ago

We definitely need some form of historical sync req-resp method. The BlobsSidecarByRange would allow for blobs to be fetched in parallel to the beacon-blocks. Fusing the methods together BeaconBlockAndBlobsSidecarByRange would simplify the sync, but limit optimizations.

tbenr commented 1 year ago

The problem with having only BeaconBlockAndBlobsSidecarByRange is the (edge?) case in which the peer have early pruned the Blobs and responds with block and ZERO BlobsSidecar. So you fall in the decoupled case in which you have to fill up the blobs for known blocks. Retrieving them using BlobsSidecarByRange opens to potential block-blobs inconsistency so the best way to do that would be by BlobsSidecarByRoot (but we might be outside the "recent chain" to which we want to limit the usage of the byRoot option, as mentioned in the today's call).

A naive option would be to discard the BeaconBlockAndBlobsSidecarByRange responses containing ZERO Blobs where you actually expect blobs, and redoing BeaconBlockAndBlobsSidecarByRange towards other peers. This is a meaningful waste of bandwidth only when there are many dishonest validators.

djrtwo commented 1 year ago

I think coupling on historic sync adds more technical debt than keeping them together due to having to rework something that is tried and true (block sync) into something that looks very different than what we expect the future to hold

Reasons being:

  1. disrupting block (no blobs) sync in a manner that does not look like what we expect the future to hold
  2. having to deal with the edge case of alternate pruning depth between Blocks and Blobs as native to block sync
  3. likely changing block sync range strategies due to the much larger blobs attached to them (e.g. getting 100 at a time might not be tenable to request to a peer)

That said, the protocol technical debt is not huge. I suspect the debt to manifest more in the engineering so will step down if the general engineering consensus is counter to my view.

realbigsean commented 1 year ago

I don't think it's possible to leave blob syncing untouched within the blob prune depth because block validation is now dependent on blobs, so we have to fit blobs into the picture somehow. With decoupling, we're introducing scenarios where a blob could show up before or after a blob, so you'd have to cache blocks and blobs and probably figure out a way to re-request either if either doesn't show up. And then you have to figure out how to attribute faults if any.

More generally, decoupling introduces asynchrony (which could be good, but is just more complicated to engineer and changes block sync code more) and the possibility of inconsistency. We could address these two things in a decoupled approach by:

  1. always making both requests at once
  2. always making both requests to the same peer
  3. if one fails, both fail

But this starts to look very much like the coupled approach, and this logic would be enabled/disabled at different prune depths, and also may mean we may require different sync strategies at different depths, so has similar drawbacks to coupling. The main difference would be two parallel streams of data vs one. And this could end up being worth it for the sake of more efficient download but we haven't seen download being the bottleneck elsewhere in sync.

Generally what I think implementation would look like in Lighthouse -

Coupling:

Decoupling:

So either way, the new logic looks like an appendage that we can later remove, and we will have to keep existing sync - the second to me is a cleaner separation, so unless we find the gain from parallel download to be meaningful for coupling would still be my vote.

I'm curious what the Prysm implementation looks like currently? Maybe @terencechain or @Inphi know? And also interested to hear what other client teams think about what their implementation might look like

realbigsean commented 1 year ago

Also I don't think prune depth edge cases will be a problem, so long as in implementation nodes serve blobs with some margin past the prune depth ( maybe a couple of epochs, maybe a couple of hours). And only request at the prune depth.

AgeManning commented 1 year ago

As @realbigsean has said, I think for Lighthouse, the coupling approach would be by far the least engineering effort solution.

We could leave our sync strategies almost entirely untouched and generalise our processing of the blocks and blobs.

If we uniformly agree of the pruning depth amongst nodes and consider nodes that prune early malicious, our sync code could also find and score these peers appropriately. It would be harder to do in the uncoupled case.

We do lose some potential optimisation in parallel downloading of blobs, but our sync automatically load balances across peers and it is adjustable such that instead of downloading many blocks from one peer, we can download fewer blocks and blobs from each peer, in effect still parallelising the downloading, just in block + blob bundles. As Sean pointed out, currently the bottleneck is not the bandwidth usually, its processing. This may change with blobs, for lower bandwidth peers, but at the moment our network is quite idle in sync waiting for blocks to be processed.

This may not be the case for other client teams who may prefer the decoupled case. Just speaking specifically for lighthouse.

tbenr commented 1 year ago

I agree that the simplest approach is to have the coupled version, which would lead to a simpler implementation. We don't need to have any concept of "optimistic" head wrt blobs. Any failure has a clear scoring attribution.

I still think that at the boundary, when we expect to start getting blobs, there is an edge case to handle. I'm thinking about being permissive when requesting blobs for the farmost epoch if we are close to the epoch transition that would allow that epoch to be pruned (this might turn out to not request blobs for that epoch at all, since by the time we reach the tip of the chain those blobs will be probably already prunable).

If we decide for the decoupled version, I think the first implementation would try to couple in the syncing process by asking a batch of blocks and blobs in parallel to the same peer, than make some simple consistency checks and then try to import the batch. If fails we consider it invalid downscoring the peer and will try again with another peer.

A more advanced approach would be to keep block sync as it is, let it import optimistically and then have a separate blobs sidecar fetcher that handle the download an kzg verification independently and mark nodes as valid. Verification failure handling and retries would be tricky here.

I'm curious to hear from Prysm too. Are they trying to couple early or do they implemented an optimistic approach?

terencechain commented 1 year ago

Apology for the delayed response. Prysm will aim to "couple" even if the protocol specification has them decoupled. Under decoupling, Prysm will need to figure out how to merge block and blob sync under one process, and this can be done either by advanced caching or greatly coupling the request next to each other. This is not ideal due to potential differences in batch sizes between block and blob requests and peering scoring attributes.

Decoupling is more complicated to implement within Prysm code base due to asynchrony and inconsistency. It's also unclear how much optimization there is by decoupling them and if we are falling down a premature optimization path

dapplion commented 1 year ago

In Lodestar implementation sync and block processing is agnostic on dealing with either blocks or block+blobs. So the byRange call is coupled effectively during range sync. I won't mind coupling these methods but don't see the problem with leaving them un-coupled either. Implementations can choose to logically couple them but the reverse is not possible.

arnetheduck commented 1 year ago

I'll leave the same comment as in https://github.com/ethereum/consensus-specs/pull/3139#issuecomment-1354327256 - I believe that while it's simpler to implement coupling right now, this is a case of simplicity becoming simplistic instead - with growing data amounts, we will want to optimize data access and coupling introduces an artificial barrier.

For the long term, I believe uncoupled-everything has more merit, even if makes the initial implementation a bit more tricky.