Closed mkalinin closed 1 year ago
How does this handle the case where a block body in the middle of the request is not found? e.g.
request: [A, B, C]
available: [A, C]
How does this handle the case where a block body in the middle of the request is not found? e.g.
request: [A, B, C] available: [A, C]
It stops at A
and returns [A]
Nice call on just focusing on pruning of TXs rather than the full payload. Allows to piggyback much more on existing RLP structures rather than requiring both sides to do a translation.
This looks good in general to me
It stops at A and returns [A]
this would depend, no? why not return [A, C]
?
Array of block hashes.
This deviates from the CL protocol a bit: we generally avoid using hashes for historical blocks for several reasons, but in particular because it does not exploit the "linearity" of a finalized block history - what we do instead is request by slot / block number - hashes are (comparatively) expensive to look up: for a range of blocks (ie 1024 blocks), this causes a lot of unnecessary seeking - what we want to do instead is serve data from flat storage in linear order which is significantly more efficient (also in EL:s where clients "freeze" storage after a certain amount of time has passed).
There are two ways to go about that:
The latter has been seen as preferable from a backwards compatibility point of view, because everything else in the EL goes by block number.
The general idea of this proposal is to piggy-back on already existing logic in EL to make this change almost free in terms of implementation.
this would depend, no? why not return [A, C]?
This is because there is no straightforward way to map payload bodies in the response on hashes in the request. It would either need to add block_hash
to the BlockBody
structure or CL would need to compute transactions_root
from received data. I am wondering how CL should handle this case for say BeaconBlocksByRoot
request? If it should respond with [A, C]
then, yes, we would need EL to reflect this logic.
the CL uses data from the ExecutionHeader to translate slots to block numbers
Using block_hash
is important because of the piggy-backing on the existing logic. What if CL grabs a span of blocks from its database, then parses block hashes from it, and send a request to EL? CL's complexity here (not sure how big it is) is decoding and then encoding each block to parse the hash and incorporate payload's body, but this seems to be unavoidable.
The general idea of this proposal is to piggy-back on already existing logic in EL to make this change almost free in terms of implementation.
I would investigate whether this is really the case with existing client implementations, as finding blocks by block number should be well supported already - a requirement to keep a by-hash index for this request for an indefinite future is quite heavy, and removing a request from a spec is a lot harder than adding a new one.
transactions_root from received data
Yes, this would be the case, or perhaps have an encoding for "no block found" if we keep a 1:1 response length. It's also a question of trust I guess, whether the CL blindly should forward whatever the EL gives it or always verify the hash regardless.
BeaconBlocksByRoot
This request is special in that it only supports unfinalized blocks and therefore the number of blocks requested tends to be small and efficiency is less of a concern.
It's used only when you receive an attestation or block that refers to a root you have not observed which is .. rare - in particular, we don't know the slot or block number of the block when serving this request. In particular, we made BlocksByRange
the "default" sync mechanism because this aligns well with a world in which chains finalize.
What if CL grabs a span of blocks from its database, then parses block hashes from it, and send a request to EL?
It's certainly possible, but it introduces a lot of work and structural inefficiency both for the EL and the CL that will be hard to rectify.
finding blocks by block number
Finding blocks by block number doesn't cover all the possible cases, it only works when the query is being made over canonical chain.
verify the hash regardless
In general, we assume the communication to be trustful. But I think verifying the hash might be a sanity check that is nice to have.
introduces a lot of work and structural inefficiency both for the EL and the CL that will be hard to rectify
I don't see the complexity on EL side as it already has to handle GetBlockBodies
request which is a very similar method to the proposed one. What additional complexity on CL does a block hashes request have comparing to a request by block numbers? Considering that decoding/encoding is anyway required for incorporating payload bodies into beacon blocks.
I don't see the complexity on EL side
I mean computational work, not implementation work: doing hundreds of hash lookups over millions of blocks is extremely slow compared to a linear scan over a range of blocks stored consecutively - practically, it's the difference between supporting HDD:s over SSD:s by and large.
Finding blocks by block number doesn't cover all the possible cases, it only works when the query is being made over canonical chain.
yes - this is a good point - there are two ways to solve this while keeping a by-slot request:
I mean computational work, not implementation work: doing hundreds of hash lookups over millions of blocks is extremely slow compared to a linear scan over a range of blocks stored consecutively - practically, it's the difference between supporting HDD:s over SSD:s by and large.
I see. This kind of work will anyway exist in EL -- GetBlockBodies
is staying in ETH protocol and is heavily used by peers in the network.
there are two ways to solve this while keeping a by-slot request
A complexity of a by-slot request is in mapping a slot number on a block height. A parameter set of BeaconBlocksByRange
request can't be mapped on a supposed set of engine_getBlockBodies
because of skipped slots. If CL is responsible in slot -> height translation it will literally be an iteration over beacon blocks in order to parse block numbers and then passing them in the request to EL. If the translation is done this way then there is no difference in translating slot numbers directly to block hashes.
Making EL responsible for slot->block number translation is much more invasive and isn't considered for inclusion into the minimal viable merge spec. I would say if this proposal isn't beneficial for CL and proposed engine_getBlockBodies
isn't expected to be utilized then we should rather forget about this proposal and get back to the question of pruning payloads after the merge. Even though there is a notion of slots on EL (for EVM purposes) it will likely be complicated to apply it to the block store and make respective index that would allow for advanced by-slot search.
This kind of work will anyway exist in EL
Right, but it doesn't have to, in a post-finalization world - it's very common to download blocks epoch by epoch, or really 1024 consecutive blocks at a time, in CL:s and I expect that trend to continue.
A complexity of a by-slot request i
Sorry, my bad - I mean by-block-height - we have the block height on the CL side anyway.
Specification
Structures
BlockBodyV1
transactions
:Array of DATA
- Array of transaction objects, each object is a byte list (DATA
) representingTransactionType || TransactionPayload
orLegacyTransaction
as defined in EIP-2718Core
engine_getBlockBodiesV1
Request
engine_getBlockBodiesV1
Array of DATA
, 32 Bytes - Array of block hashes.Response
Array of BlockBody
- Array ofBlockBody
objects.Specification
BlockBody
objects with the corresponding hashes respecting the order of block hashes in the input array.Rationale
Allows to replace
ExecutionPayload
s withExecutionPayloadHeader
s when persisting blocks on CL side and request transaction lists from EL client to serve beacon blocks to either user or remote peer. Max size ofExecutionPayloadHeader
is 532 bytes, this results in 667MB increase of the space occupied by beacon blocks during 6 months period comparing to what we have today before the Merge. The increase of the space required to store blocks with full payload objects may be up to 2.5TB considering payload max size of 2MB for the same period of time.This proposal attempts to reduce implementation complexity on EL side as semantics of this method maps on the semantics of existing
GetBlockBodies
message in ETH protocol.The limit on the size of the input array isn't required considering trustful relationship between the layers which is gonna be secured by authentication of Engine API communication channel. Network requests in CL clients are currently limited by
MAX_REQUEST_BLOCKS == 1024
. It's not expected that the size of the input array exceeds this limit in a normal case, though, a sanity limit that is greater thanMAX_REQUEST_BLOCKS
parameter value may also be considered.cc @djrtwo @arnetheduck