ethereum / portal-network-specs

Official repository for specifications for the Portal Network
313 stars 85 forks source link

Dealing with Recent Headers #336

Open kdeme opened 2 months ago

kdeme commented 2 months ago

Recent BlockHeaders / Ephemeral BlockHeaders

EL BlockHeaders that are still in the current period cannot have a proof against historical_summaries as they are not part of the used state.block_roots for the latest HistoricalSummary.

These can only be injected into the network with their proof once a period transition has occured.

Before this, they could get injected without a proof, as is currently done.

PR https://github.com/ethereum/portal-network-specs/pull/292 was/is a simple take on how to verify these.

However, the actual scenarios of how these headers get stored will probably be different.

A Portal node that is running the Portal beacon network and has a beacon light client synced will also have access to these headers as they are part of the LightClientHeader since Capella (albeit in a different serialization): https://github.com/ethereum/consensus-specs/blob/7cacee6ad64483357a7332be6a11784de1242428/specs/capella/light-client/sync-protocol.md?plain=1#L52

Currently these recent / ephemeral (proof-less) BlockHeaders fall under the same content key as headers with a proof. It has been raised in the past to moving the BlockHeader without a proof into a seperate type in the history network. I think that is a good idea as they are conceptually different than headers with proof:

The effect of this is that:

All this will simply require different storage and access features.

Some example scenarios:

Portal node that is beacon LC synced:

Portal node that is not yet beacon LC synced:

Client with no Portal beacon network running (e.g. full node with Portal integrated)

Effect of changing to a new content type

The None option in the current Union becomes invalid. However, removing the None would make all current data invalid. So if we want to clean this up properly, we need a migration path.

Storing and accessing the data

Storage would be different than the current Content databases as it requires pruning and dealing with re-orgs.

It will thus more likely end up in a separate table / persistent cache but this is up to the implementation.

Access could exist as it does now, i.e. Neighborhood based look-ups, but with optionality of nodes to store more than its radius, and thus nodes could also try to request them to any node.

Or, we could make this explicit and say that each node MUST store all. Additionally, to "backfill" faster we could add in this implicit version a range request (this is similar as we do now in the beacon network for LightCLientUpdates)

pipermerriam commented 2 months ago

Looks like storing all 8192 of the most recent headers is about 5mb.

pipermerriam commented 2 months ago

Seems we could support range queries by having a separate content key that:

In a network situation where clients are expected to store most or all of the recent headers, this could be used to quickly acquire all of the most recent headers.

pipermerriam commented 2 months ago

One way to do the content-id for recent headers might be to have the derived content-id be based on the block number. It takes 13 bits for 8192 values, so we could have a static 13 bits based on the block_height % 8192 so that blocks at certain heights always had the same most significant 13 bits and then the remaining are random from sha256(block_hash).

If my intuition is correct, this would result in two blocks that are close to each other in height, also being close to each other in the network, making it somewhat easy to navigate the network to grab sequential blocks from the recent set.

pipermerriam commented 2 months ago

If we said that all nodes had to store all headers it would introduce the first real baseline "requirement" for nodes in our network, meaning that they'd be required to both store ~5mb of data and they would have to continually acquire new headers as the entered the network.

My initial gut says that I don't like introducing this as a requirement and that maybe I'd rather it be optional. Here are some ideas.

The use cases I'm thinking we want to support are:

ogenev commented 2 months ago

A Portal node that is running the Portal beacon network and has a beacon light client synced will also have access to these headers as they are part of the LightClientHeader since Capella (albeit in a different serialization): https://github.com/ethereum/consensus-specs/blob/7cacee6ad64483357a7332be6a11784de1242428/specs/capella/light-client/sync-protocol.md?plain=1#L52

Are we going to store in db the recent 8192 ExecutionPayloadHeaders and provide those on request within the current period? I'm not sure what is exactly the difference between ExecutionPayloadHeader and EL header but are we going to miss some important data fields if we provide only ExecutionPayloadHeaders for the last ~27 hours?

My initial gut says that I don't like introducing this as a requirement and that maybe I'd rather it be optional.

We are already doing this by storing all bootstraps and LightClientUpdates for the weak subjectivity period (~4months) but I agree that it is better to make this new requirement optional.

pipermerriam commented 2 months ago

We are already doing this by storing all bootstraps and LightClientUpdates for the weak subjectivity period (~4months) but I agree that it is better to make this new requirement optional.

Is this opt-in? Is it assumed that you can request any of these from any node on the network? Are nodes on the network expected to sync the last 27 hours of these when they come online? I think this is what I mean by making it optional.

It's very different for a client to choose to store the last 27 hours of these vs a client to be expected to have the last 27 hours of these and for functionality in the network to be based on the assumption that they can request the last 27 hours of these from any node on the network.

ogenev commented 2 months ago

Is this opt-in? Is it assumed that you can request any of these from any node on the network?

For LightClientBootstraps, we currently expect all clients to store all bootstraps. The idea is to make this content available and provide all trusted block roots in our network for the last ~4 months. The user then can choose any trusted block root as a starting point to sync the light client.

Regarding LightClientUpdates, we push them into the network at the end of every period (~27 hours) and expect every node to store them, this is a requirement for the light client sync protocol to jump from the trusted block root to the current period and to start following the chain.

So I think everything depends on what kind of flexibility we want to provide for the end user for choosing their starting point to sync the light client.

acolytec3 commented 2 months ago

Use some scheme like https://github.com/ethereum/portal-network-specs/issues/336#issuecomment-2358486268 to give recent headers a predictable address in the network.

Doesn't this idea introduce a roving hot spot in the network (on the assumption that recent chain history is the most popular) centered around the nodes "nearest" to the current 13 most significant bits of the contentIDs? I get conceptually why its convenient for purposes of retrieval but feels like it could end up being a DDoS vector once we get an uptick in the "very light" clients you mentioned that are regularly dropping in and out and end up hammering the same set of nodes looking for the head of the chain.

pipermerriam commented 2 months ago

Good catch on the hot spots. Possibly we could eliminate the hotspot by expecting all nodes to store the latest 256 headers, and then stripe the rest around the network. Grabbing 256 headers in one request from a close neighbor should be pretty trivial in terms of bandwidth costs.

Still needs to be decided if the striping approach actually fixes a real problem and is necessary...

kdeme commented 2 months ago

If we decide that nodes SHOULD store 8192 recent headers then I think the simple solution that is currently done for the LightClientUpdate type in the Portal beacon network could work also here.

Currently in Beacon network

One LightClientUpdate is 26813 bytes A request allows a range of 128 currently -> ~3.5MB over 1 request is possible

But not all nodes will have the full 4 month range, it depends on when it was bootstrapped.

Instead of a neighborhood requests, random requests are done -> some might fail, that should be fine however. Random requests means that ContentId is practically not used (Not using the DHT its main feature here).

The content key is by start_period + amount

Storage and pruning:

Within Fluffy this is stored with the period as index, as it only needs to be retrieved by period. Pruning is easy as we can just delete anything with an period older than x.

Apply the same to history recent headers

This does not mean that every node that comes online needs to store all this data immediately. As long as "ephemeral" nodes are not a massive majority, this should be fine I think.

(Even in the case of localized storage/access this could/would still be an issue, but to a lesser extent)

The idea of @pipermerriam of adding in ping.custom_data the amount of headers stored would also help here.

When a node stores all ~5MB of recent headers, it could store the headers in a specialized table so that it can easily retrieve them via block number and via block hash (e.g. primary key hash, index on number or similar ways).

The protocol could then provide two different content keys, but they access the same data on the nodes. And the content keys could/would support ranges.

Pruning could work by dropping block numbers older than x. (There is a slight discrepancy with block number vs slot, as not each slot necessarily has a block, so perhaps it is a little more complex, but then again it doesn't need to be exactly the last 8192 slots i think)

Now, if for some reason this is not sufficient and too flawed causing issues retrieving this data, then yes, we will have to resort to something more complicated in terms of content id derivation and localized access to the data (as has been mentioned in above comments). I'm however questioning if this will be needed.

kdeme commented 2 months ago

Are we going to store in db the recent 8192 ExecutionPayloadHeaders and provide those on request within the current period?

I think all the necessary fields to convert to an EL BlockHeader are there. What we store in the end probably does not matter that much, but considering it is EL history network, an EL BlockHeader perhaps makes more sense.

We are already doing this by storing all bootstraps and LightClientUpdates for the weak subjectivity period (~4months) but I agree that it is better to make this new requirement optional.

Yes, but I would say the LightClientUpdates are a better example. As they are accessible by range and of similar total size, see comment above.

In the last Portal meetup I actually mentioned that I would be pro to moving the LightClientBootstraps to distribute storage over the network, considering their total size if stored for each epoch (I forgot I was going to make a tracking issue for this.): Stored for MIN_EPOCHS_FOR_BLOCK_REQUESTS (~4 months) result in ~850MB. Compared to ~3.5MB for LightClientUpdates.

pipermerriam commented 1 month ago

Implications of this are: