Dealing with Recent Headers

kdeme commented 2 months ago

Recent BlockHeaders / Ephemeral BlockHeaders

EL BlockHeaders that are still in the current period cannot have a proof against historical_summaries as they are not part of the used state.block_roots for the latest HistoricalSummary.

These can only be injected into the network with their proof once a period transition has occured.

Before this, they could get injected without a proof, as is currently done.

PR https://github.com/ethereum/portal-network-specs/pull/292 was/is a simple take on how to verify these.

However, the actual scenarios of how these headers get stored will probably be different.

A Portal node that is running the Portal beacon network and has a beacon light client synced will also have access to these headers as they are part of the LightClientHeader since Capella (albeit in a different serialization): https://github.com/ethereum/consensus-specs/blob/7cacee6ad64483357a7332be6a11784de1242428/specs/capella/light-client/sync-protocol.md?plain=1#L52

Currently these recent / ephemeral (proof-less) BlockHeaders fall under the same content key as headers with a proof. It has been raised in the past to moving the BlockHeader without a proof into a seperate type in the history network. I think that is a good idea as they are conceptually different than headers with proof:

Ephemeral as they will be re-added later with their proof and thus require pruning
Partly dependent on a re-org, notion of finality / optimistic applies.
Likely to be more frequently requested, as they are recent.
The proof path is different.

The effect of this is that:

A node might (optionally) want to store all of the last n (e.g. 8192) headers
A node will have to actively prune these
A node might have to drop / re-request some on re-org
A node might have to request several headers after each other, in order to backfill

All this will simply require different storage and access features.

Some example scenarios:

Portal node that is beacon LC synced:

Portal client with beacon network + beacon lc activated.
The beacon lc (part of the portal client) is synced and follows the head of the chain, for each finality/optimistic update the client might
- Store this header into its own database. It might also be configured to just store all recent headers.
- Gossip it into the network. Note that this gossip is probably not even needed, as it is more likely a request based operation when you walk from parent hash to parent hash

Portal node that is not yet beacon LC synced:

Portal client with beacon network + beacon lc activated.
The beacon lc is not yet synced and thus missing new recent headers
The Portal beacon client gets a header offered, but cannot verify it yet so does not accept the offer.
The beacon lc (part of the Portal client) is synced and follows the head of the chain, for each finality/optimistic update the client might
- Store this header into its own database
- (Gossip it into the network)
The Portal node might backfill its missing headers by just walking down the parent hash path and requesting one by one. (or perhaps a range if that would be supported)

Client with no Portal beacon network running (e.g. full node with Portal integrated)

Client uses Portal history network for accessing old chain data
This client does not require to access recent headers via Portal as it typically has access via other means being a full node. So it would typically not accept this data, but it could still provide it to other nodes on the Portal network.

Effect of changing to a new content type

The None option in the current Union becomes invalid. However, removing the None would make all current data invalid. So if we want to clean this up properly, we need a migration path.

Storing and accessing the data

Storage would be different than the current Content databases as it requires pruning and dealing with re-orgs.

It will thus more likely end up in a separate table / persistent cache but this is up to the implementation.

Access could exist as it does now, i.e. Neighborhood based look-ups, but with optionality of nodes to store more than its radius, and thus nodes could also try to request them to any node.

Or, we could make this explicit and say that each node MUST store all. Additionally, to "backfill" faster we could add in this implicit version a range request (this is similar as we do now in the beacon network for LightCLientUpdates)

pipermerriam commented 2 months ago

Looks like storing all 8192 of the most recent headers is about 5mb.

pipermerriam commented 2 months ago

Seems we could support range queries by having a separate content key that:

anchors to a specific header hash
has the same content-id as the header hash it is anchored to.
has an additional ancestor_depth field that is used to specify how many additional historical headers the client wants.

In a network situation where clients are expected to store most or all of the recent headers, this could be used to quickly acquire all of the most recent headers.

pipermerriam commented 2 months ago

One way to do the content-id for recent headers might be to have the derived content-id be based on the block number. It takes 13 bits for 8192 values, so we could have a static 13 bits based on the block_height % 8192 so that blocks at certain heights always had the same most significant 13 bits and then the remaining are random from sha256(block_hash).

If my intuition is correct, this would result in two blocks that are close to each other in height, also being close to each other in the network, making it somewhat easy to navigate the network to grab sequential blocks from the recent set.

pipermerriam commented 2 months ago

If we said that all nodes had to store all headers it would introduce the first real baseline "requirement" for nodes in our network, meaning that they'd be required to both store ~5mb of data and they would have to continually acquire new headers as the entered the network.

My initial gut says that I don't like introducing this as a requirement and that maybe I'd rather it be optional. Here are some ideas.

Add a new field to the custom_data field in our Ping/Pong message which indicates how many recent headers you store.
Use some scheme like this one from my previous comment to give recent headers a predictable address in the network.
Maybe store them by both hash and number so that we can fetch them sequentially by hash or in bulk using the block number.

The use cases I'm thinking we want to support are:

very light nodes not having to bootstrap this data into their storage if they aren't likely to be online for very long or aren't going to store very much data
retrieval in bulk so that clients can fetch the full recent history in only a few requests (maybe even just a single request).
for those that want to store all of the recent headers, make it easy to get new ones and make it easy to grab the most recent 8192 as well.
- lookup by number makes it faster to fetch the most recent 8192 concurrently
- lookup by hash makes for the most secure fallback approach if you can't find anyone to transfer them to you in bulk.

ogenev commented 2 months ago

A Portal node that is running the Portal beacon network and has a beacon light client synced will also have access to these headers as they are part of the LightClientHeader since Capella (albeit in a different serialization): https://github.com/ethereum/consensus-specs/blob/7cacee6ad64483357a7332be6a11784de1242428/specs/capella/light-client/sync-protocol.md?plain=1#L52

Are we going to store in db the recent 8192 ExecutionPayloadHeaders and provide those on request within the current period? I'm not sure what is exactly the difference between ExecutionPayloadHeader and EL header but are we going to miss some important data fields if we provide only ExecutionPayloadHeaders for the last ~27 hours?

My initial gut says that I don't like introducing this as a requirement and that maybe I'd rather it be optional.

We are already doing this by storing all bootstraps and LightClientUpdates for the weak subjectivity period (~4months) but I agree that it is better to make this new requirement optional.

pipermerriam commented 2 months ago

We are already doing this by storing all bootstraps and LightClientUpdates for the weak subjectivity period (~4months) but I agree that it is better to make this new requirement optional.

Is this opt-in? Is it assumed that you can request any of these from any node on the network? Are nodes on the network expected to sync the last 27 hours of these when they come online? I think this is what I mean by making it optional.

It's very different for a client to choose to store the last 27 hours of these vs a client to be expected to have the last 27 hours of these and for functionality in the network to be based on the assumption that they can request the last 27 hours of these from any node on the network.

ogenev commented 2 months ago

Is this opt-in? Is it assumed that you can request any of these from any node on the network?

For LightClientBootstraps, we currently expect all clients to store all bootstraps. The idea is to make this content available and provide all trusted block roots in our network for the last ~4 months. The user then can choose any trusted block root as a starting point to sync the light client.

Regarding LightClientUpdates, we push them into the network at the end of every period (~27 hours) and expect every node to store them, this is a requirement for the light client sync protocol to jump from the trusted block root to the current period and to start following the chain.

So I think everything depends on what kind of flexibility we want to provide for the end user for choosing their starting point to sync the light client.

acolytec3 commented 2 months ago

Use some scheme like https://github.com/ethereum/portal-network-specs/issues/336#issuecomment-2358486268 to give recent headers a predictable address in the network.

Doesn't this idea introduce a roving hot spot in the network (on the assumption that recent chain history is the most popular) centered around the nodes "nearest" to the current 13 most significant bits of the contentIDs? I get conceptually why its convenient for purposes of retrieval but feels like it could end up being a DDoS vector once we get an uptick in the "very light" clients you mentioned that are regularly dropping in and out and end up hammering the same set of nodes looking for the head of the chain.

pipermerriam commented 2 months ago

Good catch on the hot spots. Possibly we could eliminate the hotspot by expecting all nodes to store the latest 256 headers, and then stripe the rest around the network. Grabbing 256 headers in one request from a close neighbor should be pretty trivial in terms of bandwidth costs.

Still needs to be decided if the striping approach actually fixes a real problem and is necessary...

kdeme commented 2 months ago

If we decide that nodes SHOULD store 8192 recent headers then I think the simple solution that is currently done for the LightClientUpdate type in the Portal beacon network could work also here.

Currently in Beacon network

One LightClientUpdate is 26813 bytes A request allows a range of 128 currently -> ~3.5MB over 1 request is possible

But not all nodes will have the full 4 month range, it depends on when it was bootstrapped.

Instead of a neighborhood requests, random requests are done -> some might fail, that should be fine however. Random requests means that ContentId is practically not used (Not using the DHT its main feature here).

The content key is by start_period + amount

Storage and pruning:

Within Fluffy this is stored with the period as index, as it only needs to be retrieved by period. Pruning is easy as we can just delete anything with an period older than x.

Apply the same to history recent headers

This does not mean that every node that comes online needs to store all this data immediately. As long as "ephemeral" nodes are not a massive majority, this should be fine I think.

(Even in the case of localized storage/access this could/would still be an issue, but to a lesser extent)

The idea of @pipermerriam of adding in ping.custom_data the amount of headers stored would also help here.

When a node stores all ~5MB of recent headers, it could store the headers in a specialized table so that it can easily retrieve them via block number and via block hash (e.g. primary key hash, index on number or similar ways).

The protocol could then provide two different content keys, but they access the same data on the nodes. And the content keys could/would support ranges.

Pruning could work by dropping block numbers older than x. (There is a slight discrepancy with block number vs slot, as not each slot necessarily has a block, so perhaps it is a little more complex, but then again it doesn't need to be exactly the last 8192 slots i think)

Now, if for some reason this is not sufficient and too flawed causing issues retrieving this data, then yes, we will have to resort to something more complicated in terms of content id derivation and localized access to the data (as has been mentioned in above comments). I'm however questioning if this will be needed.

kdeme commented 2 months ago

Are we going to store in db the recent 8192 ExecutionPayloadHeaders and provide those on request within the current period?

I think all the necessary fields to convert to an EL BlockHeader are there. What we store in the end probably does not matter that much, but considering it is EL history network, an EL BlockHeader perhaps makes more sense.

We are already doing this by storing all bootstraps and LightClientUpdates for the weak subjectivity period (~4months) but I agree that it is better to make this new requirement optional.

Yes, but I would say the LightClientUpdates are a better example. As they are accessible by range and of similar total size, see comment above.

In the last Portal meetup I actually mentioned that I would be pro to moving the LightClientBootstraps to distribute storage over the network, considering their total size if stored for each epoch (I forgot I was going to make a tracking issue for this.): Stored for MIN_EPOCHS_FOR_BLOCK_REQUESTS (~4 months) result in ~850MB. Compared to ~3.5MB for LightClientUpdates.

pipermerriam commented 1 month ago

History network will support fetching the most recent 8192 headers from other clients on the network.
Clients will have a choice on how many recent headers they store, currently we'd like all clients to choose to store all 8192 which is roughly 5mb of storage overhead.
Clients will signal how many recent headers they have via the custom_data field from the Ping and Pong messages.
Request for recent headers will be anchored by block.hash and can request up to 256 recent headers.
The content key for requesting this will not have a meaningful content-id meaning that clients should generally not gossip this type or accept gossip of this type.
The content key will be a new content type for the history network.
The proving path for recent headers will be to follow the parent hash chain.

Implications of this are:

The None type will be removed from the Union for the current header content type. This means that clients will have to implement a migration path to fix the union selector byte for the types that are still valid and clients will need to remove any of the currently stored headers that lack a proof.
Clients will need a header oracle that continually updates the client with whatever the latest header is.
Clients that join the network will need to sync the recent headers.
Our bridge infrastructure will need to gossip receipts and bodies for this range (which I believe they already do).
Our hive tests that gossip recent headers may need to be updated since there is no longer a path for gossiping headers without proofs.

ethereum / portal-network-specs