ethereum / consensus-specs

Ethereum Proof-of-Stake Consensus Specifications
Creative Commons Zero v1.0 Universal
3.53k stars 959 forks source link

[EIP-4844][Discussion] Blob gossip #3180

Open realbigsean opened 1 year ago

realbigsean commented 1 year ago

There has been more discussion about alternative blob/block gossip designs including the possibility of decoupling blocks and blobs in gossipsub. @arnetheduck suggested large bandwidth reductions could be achieved by only listening to blobs and downloading blocks via a decoupled by-root request. The reverse, gossiping blocks and downloading blobs by root, is structurally similar to what full DAS might look like.

However, there are concerns about the potential for additional round trips of communication at each hop and the difficulty in knowing which nodes to request blobs from if blocks are gossiped without the associated blobs.

Episub is another way we can decrease bandwidth by reducing the gossip amplification factor. We’re still trying to understand to what degree this will have an impact, so far it seems it may be more beneficial in larger networks, but these are difficult to run reliably in testing. A solution where we only gossip one of blob/block would still benefit from episub because of the bandwidth reduction on other/existing topics.

It seems like we will stick with coupling in the near term but it may be worth exploring alternative designs sooner rather than later in case bandwidth or network propagation times look suspect in testing. I think it’s also worth discussing which of these we are prioritising to what degree:

arnetheduck commented 1 year ago

sooner rather than later

I'd even go further and say that we should move to a decoupled design given that it's more flexible and semantically closer to what blobs are meant to be - ie decoupled pieces of data with different lifetimes - this ensures that we don't put up artificial barriers for efficiency and engineering work that becomes difficult to do when the "official" protocol says things should be coupled and can lead to the ossification of a needlessly suboptimal approach - the simplicity gains seem temporary at best and increase total complexity at worst by adding constraints and constructs that naturally have no raison d'etre.

If it turns out at closer to the end that indeed there are no benefits to be had whatsoever, it's trivial to move back to a coupled design.

Episub, for all its benefits, looks unlikely to make it into the protocol by the eip4844 HF - I think it would be unwise to couple it (haha) with the coupling gossip in general, since it's largely an orthogoncal piece of work that should stand on its own and be switched to when it's ready - until that time, it's easier to reason about eip4844 in isolation and work on it based on existing gossip assumptions (ie episub might never happen).

terencechain commented 1 year ago

I'd be happy to support a decoupled implementation for Prysm. It might be worth it to have two implementations in parallel, as I feel like it's hard to come down to a verdict on which design is superior. I think once we have both of them in code and have numbers (ie propagation time) to back up, it may become easier to drive something into final

mkalinin commented 1 year ago

decoupled pieces of data

Arguably, they are coupled from DA perspective and there is an opportunity to decouple these pieces of data on the gossip layer. We should consider that DAS implementation on the network layer may differ from how decoupled block/blob dissemination will look like. Therefore, engineering complexity of the decoupling should be outweighed by the gain that we get by employing such a design.

As for the metrics, IMO, we should aim an X percentile of block and blobs fully disseminated up to Y seconds into a slot, where Y leaves enough time for block/blob verification to be finished before the attestation deadline. Where X should be close to either what we currently see on the Mainnet (if possible) or to some boundary (say 95%) that we think good enough for the networking layer to provide consensus layer with required properties (2/3 votes by stake are included on chain) under increased latency and computational complexity. The other way around, how much of additional latency and validation complexity would be allowed on the networking layer if we took this and that approach?

I don't know if there are any simulators at hand that we can use to simulate both scenarios (coupled and decoupled) for a network with a size comparable to the mainnet. I feel like this kind of simulation could be a good start for understanding potential difference between the approaches. If say decoupled scenario would be giving us 1s of advantage then it is definitely worth considering, while the diff of 0.1s would be much less appealing.

Looping in @Nashatyrev to hear his thoughts on the simulation, particularly, could it be done with libp2p simulator that his has created a while ago.

sauliusgrigaitis commented 1 year ago

the simplicity gains seem temporary at best and increase total complexity at worst by adding constraints and constructs that naturally have no raison d'etre.

I'm not sure I agree with this. My understanding was that coupling solves a class of serious problems that exist only in the decoupled model. I mean this idea of decoupled blocks/blobs in general sounds really great, just like full sharding etc., however, the goal of EIP-4844 was to ship data solution fast for rollups. I actually would try to strip down EIP-4844 even more, for example, drop KZG (a large part of it is just simulation for upcoming sharding) instead of including anything that is more complex than EIP-4844 currently has.

Nashatyrev commented 1 year ago

Looping in @Nashatyrev to hear his thoughts on the simulation, particularly, could it be done with libp2p simulator that his has created a while ago.

Yeah, my Gossip simulator was able to handle around 10K 'peers', so I believe it's quite possible to do these kind of simulations

Took some time to refresh my memory on the simulator. Leaving the links here just for the record:

Nashatyrev commented 1 year ago

I would also add @AgeManning to this thread. I think he may comment on the current episub status

AgeManning commented 1 year ago

@Nashatyrev - I haven't looked over your simulator. I'm running into issues with the one we have built. It uses the testground framework and can't seem to handle nodes over 100, or the results become unreliable. We're looking into running it on k8.

The current status of Episub, is that its built and running in the simulator. The results on small networks appear reasonable and as expected. Its highly tuneable, so for bad settings it performs worse, for good settings it can perform better.

I'm yet to make a push to include it into the libp2p specs and therefore the official implementation/repositories without solid large-scale data that verifies its a strict improvement.

I believe there is a go implementation also. If your simulator can reliably test 10k peers, it might be worth trying it out there also.