De-duplicate payload from persisted beacon blocks

dapplion commented 1 year ago

Problem description

Since the merge, both execution and Lodestar beacon nodes persist the block's execution payload into the DB.

At an average block size of 100Kb, that's about 720 MB / day or 263 GB / year of redundant data we don't really need to store. See https://ycharts.com/indicators/ethereum_average_block_size According to metrics, current Lodestar DB growth averaged over the last 30 days on a mainnet node without validator is 666 MB/day.

Solution description

Instead Lodestar should persist in its DB blinded blocks, and retrieve from the execution node the payloads on demand to comply with:

ReqResp beacon_blocks_by_root
ReqResp beacon_blocks_by_range
API requests involving the full block

All of this operations are not super time sensitive so the added latency is not a breaking deal

Additional context

No response

matthewkeil commented 1 year ago

Initial thoughts are that DB growth over time should only be affected by finalized blocks so limiting modification to blockArchive as overhead for converting blocks in hot block db will likely be too high.

Existing parts of code that interact with `blockArchive`

API call usage:

beacon-node/src/api/impl/beacon/blocks/index.ts

getBlockHeaders

beacon-node/src/chain/chain.ts

getCanonicalBlockAtSlot
getBlockByRoot

ReqResp usage:

beacon-node/src/network/reqresp/handlers/beaconBlocksByRange.ts

onBlocksOrBlobSidecarsByRange

beacon-node/src/network/reqresp/handlers/beaconBlocksByRoot.ts

onBeaconBlocksByRoot

Block archival:

beacon-node/src/chain/archiver/archiveBlocks.ts

migrateBlocksFromHotToColdDb

Sync:

beacon-node/src/sync/backfill/backfill.ts

backfillSync.sync
backfillSync.fastBackfillDb
backfillSync.syncBlockByRoot
backfillSync.fastBackfillDb
backfillSync.syncRange
backfillSync.extractPreviousFinOrWsCheckpoint

Methodology

Storage

Will attempt to use the same db bucket for both blinded and full archived blocks. For users that have an existing database it will be important to distinguish between the types, when serialized, so that deserialization works correctly. try/catching the ssz deserialization will be very slow so @dapplion suggested a great idea to use a bit flag within the container offset.

All serialized blocks start with 0x00000064 so using the first bit to distinguish between the two seems like it will work really well. I suggest using 0xff000064 for blinded blocks and the standard 0x00000064 for full blocks as that is how they are stored now.

This will also allow for a single db Repository to accommodate both. The type will need to be updated to Repository<Slot, allForks.FullOrBlindedSignedBeaconBlock> to allow for the change but the member function types can be narrowed to allForks.SignedBeaconBlock so all other existing code works as expected and only allForks.SignedBeaconBlocks are passed to and from the Repository.

Deserialization

When pulling blocks out of the db, a simple bit check will allow for fast type determination. Swapping the bit back to 0x00 for blinded blocks will ensure that ssz deserialization works correctly.

The check can happen in decodeValue before running getSignedBlockTypeFromBytes to pul the correct ssz type for deserialization. getSignedBlockTypeFromBytes can be updated by passing a isBlinded flag to pull the correct container from config.getBlindedForkTypes or config.getForkTypes. The block archive also calls getSignedBlockTypeFromBytes but false can be passed for the parameter to always pull the full container for the hot db.

decodeValue is a synchronous method so pulling the full block from the execution engine will need to happen in the repository methods that get blocks.

Serialization

Once blinded blocks are serialized the first bit will be swapped to 0xff before storing in the database. This can happen within encodeValue after running through this.config.getBlindedForkTypes(value.message.slot).SignedBeaconBlock.serialize(value). The binary put methods do not call encodeValue so conversion should probably happen in the put methods. This will also match the deserialization method for consistency.

`SignedBeaconBlock` to `SignedBlindedBeaconBlock` Conversion

A private method blindedFromFullBlock can be added to the BlockArchiveRepository so put, putBinary, batchPut and batchPutBinary have a common API for conversion. The transactionRoot will be calculated and the container shape will be converted so it serializes correctly. See "Potential Issues" below about putBinary and batchPutBinary.

`SignedBlindedBeaconBlock` to `SignedBeaconBlock` Conversion

Pulling the ExecutionPayload from the execution engine will be required to get the transaction list for recreation of the full block.

The db is created outside of the BeaconNode class and passed in as a parameter to BeaconNode.init. The executionEngine is created inside of BeaconNode.init during creating on the new BeaconChain. The execution engine is dependent on metrics and AbortController, which are also both created within BeaconNode.init so passing the execution engine into the Db constructor is infeasible without a substantial refactor.

One possible suggestion is to add a method to the BlockArchiveRepository to setExecutionEngine after initialization. The method can be called within the BeaconNode constructor.

Once the execution engine is available within the BlockArchiveRepository it will be relatively simple to pull the full ExecutionPayload via executionEngine.getPayloadBodiesByHash and passing in the block.message.body.executionPayloadHeader.blockHash and reassembling the block.

A private method fullBlockFromMaybeBlinded can be added to the BlockArchiveRepository so get and valueStream have a common API for reassembly. getBinary will need to be updated to decodeValue the block first and pass to fullBlockFromMaybeBlinded and then re-serialize it which is not ideal (see notes below). The getSlot* methods pull the slot via binary data which should not be affected (slot stored at same offset for both types). All other getter methods internally call get, valueStream or getBinary.

Potential Issues

Conversion for binary methods will require deserialization and re-serialization. This is not ideal from a performance perspective.

PR #5573 just modified the code to avoid this. That PR updated beacon-node/src/sync/backfill/backfill.ts and the batchPutBinary may want to get put back to batchPut as the deserialized blocks are available in the calling context.

migrateBlocksFromHotToColdDb uses block.getBinary and blockArchive.batchPutBinary and may want to be swapped to block.get and blockArchive.batchPut.

BackFillSync.syncBlockByRoot uses blockArchive.putBinary and may want to be converted to blockArchive.put as the deserialized block is available in syncBlockByRoot.

Repository.entriesStream is used by onBlocksOrBlobSidecarsByRange in reqresp and there is a call to this.decodeValue.bind(this) which should, but may not, isomorphically call the correct decodeValue. Will need to be double checked that the correct method is called so the deserialization does not throw.

dapplion commented 1 year ago

To recap a bit:

Block DB input/output paths:

Input: after successfully importing block
Output: ReqResp request, REST API request, Regen block replay

Storage schema

Use same format / strategy for archive and hot DB. Use the first byte of the payload are version byte. This allows to make the migration optional, or not do it at all.

First byte 0x00: Full block
First byte 0x01: Blinded block

Inserting block

After this change blocks must always be inserted as blinded. In the import flow, we can compute the execution header from the struct value which has cached hashing. Then merge those bytes with the serialized payload and persist

Serving blocks

For API and ReqResp requests:

Read block from DB
If 0x00, response completed
If 0x01, fetch execution payload from execution node

For Regen replay:

Both Full and Blinded blocks are ok for state transition

matthewkeil commented 1 year ago

This allows to make the migration optional, or not do it at all.

Should there be a cli flag to turn the feature on?

dapplion commented 1 year ago

Should there be a cli flag to turn the feature on?

if easy to implement, it's a good to have in case there are issues in the future

dapplion commented 1 year ago

We can tell if a serialized execution payload is blinded or not by looking at the extra_data offset value. So no need to prefixes in the DB

offset	ExecutionPayloadHeader	ExecutionPayload
0	fixed fields (size N - 4)
x ∈[0, N-4]	extra_data: N+64 (offset)	extra_data: N+8 (offset)
N	transactions_root (data)	transactions: offset
N + 4	--	withdrawals: offset
N + 8	--	[extra_data]
N + 32	withdrawals_root (data)
N + 64	[extra_data]

dapplion commented 1 year ago

@matthewkeil I've done a sketch of how this feature could be implemented, can you take a look if this approach make sense to you? https://github.com/ChainSafe/lodestar/compare/dapplion/dedup-payloads?expand=1

matthewkeil commented 1 year ago

We can tell if a serialized execution payload is blinded or not by looking at the extra_data offset value. So no need to prefixes in the DB

Awesome! I will get this implemented when i switch back to this task. Should be this sprint.

@matthewkeil I've done a sketch of how this feature could be implemented, can you take a look if this approach make sense to you? https://github.com/ChainSafe/lodestar/compare/dapplion/dedup-payloads?expand=1

Yep. Looks good @dapplion!! Is very similar to how I was doing on my work branch.
https://github.com/ChainSafe/lodestar/compare/unstable...mkeil/dedup-beacon-block?expand=1

I will read through your changes carefully and makes sure I limit the changes to just what you recommended. I found there were some places that the types do not line up when moving to FullOrBlindedBeaconBlock for the two Repositories but those changes were pretty minimal. I'll message you when I start on this work again and will let you know if i have any questions as I go.

matthewkeil commented 1 year ago

Note to self: Make sure that #5923 still works correctly during PR process

matthewkeil commented 11 months ago

@dapplion here are the perf results from doing the splicing with and without deserializing the block first. The test file to check methodology is here: https://github.com/ChainSafe/lodestar/blob/mkeil/dedup-beacon-block-2/packages/beacon-node/test/perf/util/fullOrBlindedBlock.test.ts

  fullOrBlindedBlock
    BlindedOrFull to full
      phase0
        ✔ phase0 to full - deserialize first                                  16947.43 ops/s    59.00600 us/op        -       9119 runs  0.606 s
        ✔ phase0 to full - convert serialized                                  2985075 ops/s    335.0000 ns/op        -    1989539 runs   1.01 s
      altair
        ✔ altair to full - deserialize first                                  10005.90 ops/s    99.94100 us/op        -       3021 runs  0.410 s
        ✔ altair to full - convert serialized                                  3076923 ops/s    325.0000 ns/op        -    1226301 runs  0.606 s
      bellatrix
        ✔ bellatrix to full - deserialize first                               6555.443 ops/s    152.5450 us/op        -       9250 runs   1.57 s
        ✔ bellatrix to full - convert serialized                               2450980 ops/s    408.0000 ns/op        -    1043460 runs  0.606 s
      capella
        ✔ capella to full - deserialize first                                 6236.319 ops/s    160.3510 us/op        -       3144 runs  0.678 s
        ✔ capella to full - convert serialized                                 2469136 ops/s    405.0000 ns/op        -    1035528 runs  0.606 s
    BlindedOrFull to blinded
      phase0
        ✔ phase0 to blinded - deserialize first                               17687.53 ops/s    56.53700 us/op        -       6073 runs  0.404 s
        ✔ phase0 to blinded - convert serialized                               9523810 ops/s    105.0000 ns/op        -    2525364 runs  0.505 s
      altair
        ✔ altair to blinded - deserialize first                               9639.483 ops/s    103.7400 us/op        -       8749 runs   1.01 s
        ✔ altair to blinded - convert serialized                               9708738 ops/s    103.0000 ns/op        -    5107628 runs   1.01 s
      bellatrix
        ✔ bellatrix to blinded - deserialize first                            96.84429 ops/s    10.32585 ms/op        -         82 runs   1.35 s
        ✔ bellatrix to blinded - convert serialized                           98.84780 ops/s    10.11656 ms/op        -         53 runs   1.04 s
      capella
        ✔ capella to blinded - deserialize first                              47.96520 ops/s    20.84845 ms/op        -         21 runs  0.949 s
        ✔ capella to blinded - convert serialized                             47.11033 ops/s    21.22677 ms/op        -         36 runs   1.27 s

ChainSafe / lodestar