BlocksByRange under WS sync

As brought up by @mbaxter, it is unclear at this moment how to handle BlocksByRange requests when a node has synced from a weak subjectivity state.

A few questions:

What minimum block/epoch ranges (wrt current epoch) are expected to baseline be served by all nodes?
How does a node know the range a peer will serve?
If a BlocksByRange request is made to a peer and all or some subset of slots in that range are not in custody of the peer, what should be the response?

Question 1

What minimum block ranges (wrt current slot) are expected to baseline be served by all nodes?

The spec current says:

Clients MUST keep a record of signed blocks seen since the start of the weak subjectivity period and MUST support serving requests of blocks up to their own head_block_root

"weak subjectivity period" is poorly defined in this context. I would suggest that that upper limit of the weak subjectivity period (~5 months at 100% safety decay) is used as a constant in this spec to define the minimum epoch range that a node is expected to backfill and keep around for serving. (The following is adopted from the weak-subjectivity guide):

MIN_EPOCHS_FOR_BLOCK_REQUESTS = (
    MIN_VALIDATOR_WITHDRAWABILITY_DELAY
    + MAX_SAFETY_DECAY * CHURN_LIMIT_QUOTIENT // (2 * 100)
)

Where MAX_SAFETY_DECAY = 100 and thus MIN_EPOCHS_FOR_BLOCK_REQUESTS = 33024 (~5 months).

This could be considered overkill for small validator sets or more aggressive safety decays, but using the maximum here does not put an overly large block storage requirement on honest nodes and provides a higher guarantee in baseline quality of service without trying to integrate dynamic WS period lengths into the assumptions in the networking protocol.

Question 2

How does a node know the range a peer will serve?

By default, we can assume that all nodes will serve MIN_EPOCHS_FOR_BLOCK_REQUESTS worth of blocks from the current epoch. If not, that is grounds for de-scoring. But, many nodes might choose to serve beyond this range (maximally back to genesis).

There are two obvious places to publish this info, in Status or MetaData. Status is generally for info on head state of a node to decide if the node is on chains you care about for syncing, while MetaData is for more slowly changing information about the node in general.

I can see the argument for using each

Put lower bound on block serve epoch in Status because Status is already used frequently when crafting block sync requests
Put lower bound on block serve epoch in MetaData because it isn't expected to change frequently and can be used to make more static decisions about which peers you want to sync with in general.

Due to the expected slowly changing nature of the piece of data, I have a slight preference for putting it in MetaData but would like to hear the opinions of others.

For nodes that are still back-filling blocks from a WS state or are block syncing from genesis, it might be worthwhile to signify this in MetaData as well. We can use FAR_FUTURE_EPOCH for the earliest block serve epoch to signify this.

Question 3

If a BlocksByRange request is made to a peer and all or some subset of slots in that range are not in the custody of the peer, what should be the response?

@mbaxter noted the following options available:

Return an error
Return an empty response
Return any sequence of blocks in the range that are available

Both the 2nd and 3rd options provide misleading information about the slot range in question to the requester.

I lean towards (1) -- return an error.

Question 2

Due to the expected slowly changing nature of the piece of data, I have a slight preference for putting it in MetaData but would like to hear the opinions of others.

Once a node is fully synced, the earliest block served likely won't change. But it could be useful to broadcast updates on the earliest block served as the node syncs. We could then use this more granularly to decide which nodes are capable of serving us the blocks we need. Whereas if we have a special value to indicate a node is syncing (FAR_FUTURE_EPOCH), we would have to consider any nodes in the process of syncing incapable of serving. Or else we would have to try requests against syncing nodes expecting that they might fail.

Question 3

I lean towards returning an error as well, as this is the most explicit option. Would it make sense to add a new standard error response code for this?

Another point we may want to consider is the possibility that a node wants to sync from genesis or an earlier state (I haven't thought about how likely this is going to be).

But in this case, when a node starts it will want to find peers that can serve it blocks it needs. It will be inefficient to have to find random peers, get status or metadata and then find out they don't support the block range and repeat. This is why I suggested putting the value in the ENR also, so we can search for peers that support the ranges we need.

Potentially we won't have many peers doing this and it might not be worth putting it in. Just raising this as something to consider.

upper limit of the weak subjectivity period

:+1:, though we should probably extend it "a little" to ensure that even when we're at max, we still can sync when considering network delays etc - in the past, we've considered "double" or "a few epochs" to be "a little" and there are arguments for either. it's also good that this is a constant, which makes it easier to reason able upper limits for storage that eth2 "requires" - there's already an upper bound on block size.

another thing is that we should redefine this to follow the slot rather than the head block since the range request in general works with slots.

finally, depending on the outcome of the finalized-vs-wsp discussion, if we decide wsp states are not finalized, we need to ensure that clients can download blocks from min(finalized, wsp), else they can't replay blocks to reach a head that points out the correct checkpoint. It's still possible to put an upper bound on storage in this case, but it becomes more difficult (essentially, the upper bound is now defined by the decay which causes validators to be ejected)

Once a node is fully synced, the earliest block served likely won't change.

why is this? generally, a node will want to prune information regardless if it's starting up or running uninterrupted for a long time and thus it will change during the course of a connection - in fact, the expectation would be that the highest quality nodes will be running uninterrupted and will not want to reconnect their "trusted" connections or anything like this (unlike us devs that are running and restarting all the time) and if they are to maintain bounds on disk usage etc, they need to prune continuously - in fact, to ensure that we don't have long pruning delays, it makes sense to simply prune on every epoch.

metadata vs status

these two RPC requests are somewhat ambiguous in how they're supposed to be used as "often" vs "sometimes" doesn't really have a clear definition, and because they are both tiny, it doesn't really matter that much - either a client has to get them once on connection, or it has to poll for them - if it's polling, the gain from having two different polling frequencies is mostly negligible while adding complexity and confusion.

Since sync-related data is already in status, it seems more natural to include it there - then we can reframe the raison-d'etre for the messages to be sync and gossip essentially which gives them a better reason to exist separately rather than being combined into one.

putting the value in the ENR also

one thing to consider is that once things get optional, they will tend towards the minimum "allowed" and the utility of putting them in ENR and similar will go down. what we're basically saying here is that if you want to run an archive node, you may want to announce it, but "most" nodes will support the minimum and that's it, as defined by the constant.

errors

we've avoided error codes so far because they can easily be gamed, misinterpreted or simply wrong due to changing conditions - it turns out that regardless of what error code the client gives you, you need to treat them as dishonest until they're proven to be honest - let's say we introduce an error code, and the peer responds that it won't answer your request - does that mean that the range has blocks or not? you can't trust the one peer so you need to find out by downloading later blocks whose parent field will tell you. ie in all these cases, your behaviour as a downloading client is the same, regardless if they give you an empty range or an error code.

In this case, when MIN_EPOCHS_FOR_BLOCK_REQUESTS exists and I have a clock, I can compute what horizon I can expect from the client, and penalize them if they don't answer my requests correctly - the situation is equivalent to not responding with blocks in any other range really (in the middle of sync for example) - and the logic for penalizing them is exactly the same: I ask for a range, they give me empty blocks, then I find out (from another peer?) that blocks existed in this range - I can now proceed to penalize the peer that did not follow the protocol. A similar situation exists when the peer announces they have more blocks through status - if they later respond with an empty range that's proven to be non-empty, they're faulty and should be scored accordingly.

let's say we introduce an error code, and the peer responds that it won't answer your request - does that mean that the range has blocks or not? you can't trust the one peer so you need to find out by downloading later blocks whose parent field will tell you. ie in all these cases, your behaviour as a downloading client is the same, regardless if they give you an empty range or an error code

I don't think this is true. If the client responds they don't have blocks for that range, you know they're useless to you (since you still need to sync that range) so can disconnect them immediately and find a peer that is useful to you. If they return an empty range, you have to compare parent roots with other responses and other nodes to see if they're lying. Regardless of what happens though you can never be sure if they just switched forks at an inopportune time or if they're malicious so you wind up down scoring them and have to go through that process a couple of times before they get disconnected. Worse, you also have to suspect that the slots were empty and other peers were lying so they get down scored as well.

The lack of clear information about whether a node is claiming a slot is empty or if it just didn't return the block for some other reason makes ETH2 sync massively more complex than it needs to be. At least at the moment, if you get an empty response it means the node claims it has no blocks in that range at all. We'd lose that if we don't have a specific error code for this case.

Basically it's extremely helpful to have a clear indication of what the node is claiming, even if we don't always know whether to believe the claim or not.

Thanks for the feedback. I'm working on a PR and will ping the participants here for input

ethereum / consensus-specs