ChainSafe / lodestar

🌟 TypeScript Implementation of Ethereum Consensus
https://lodestar.chainsafe.io
Apache License 2.0
1.18k stars 289 forks source link

Reducing CachedBeaconState size #5857

Open dapplion opened 1 year ago

dapplion commented 1 year ago

Problem description

Lodestar is relatively inefficient when representing sparse data, such a persistent merkle tree:

State VC len SSZ TreeDU CachedState
mainnet_1880255 216,541 32 MB 229 MB 292 MB
mainnet_2620447 263,370 39 MB 271 MB 356 MB
mainnet_6963647 852,585 121 MB 837 MB 1073 MB (x8.8 SSZ)
forecast 1,400,000 200 MB - 1700 MB
forecast 1,700,000 240 MB - 2100 MB
forecast 2,300,000 330 MB - 2900 MB

Summary of state size by index count

Why is the CachedState so heavy?

The memory cost of basic building blocks of CachedBeaconState using @chainsafe/persistent-merkle-tree is

Validator                   - 625.8 bytes / instance
BranchNodeStruct(Validator) - 765.1 bytes / instance
LeafNode                    - 103.1 bytes / instance
BranchNode                  - 131.4 bytes / instance
blst.PublicKey(affine)      - 82.4 bytes / instance

While this numbers are not great, a lot of work had been done to reduce them from a naive initial design, like using a pseudo fixed size Uint8Array, the HashObject

At large index count, those numbers add up fast. Using mainnet_6963647 with 852585 indexes (note that the branch nodes count of N leaves is N)

validators data:  765 * 852585     = 652 MB
validators cache: 131 * 852585     = 111 MB
balances data:    103 * 852585 / 4 = 21  MB
balances cache:   131 * 852585 / 4 = 28  MB
inactivity data:  103 * 852585 / 4 = 21  MB
inactivity cache: 131 * 852585 / 4 = 28  MB
TOTAL ------------------------------ 861 MB (837 MB measured)

Breakdown of main contributors to BeaconState Tree memory

blst.Pubkeys:      82 * 852585     = 70  MB
PubkeyMap:        157 * 852585     = 133 MB
TOTAL ------------------------------ 203 MB (236 MB measured, 1073 - 837)

Breakdown of main contributors to BeaconStateCache memory

Note that the BeaconStateCache memory cost is payed once per application, and most of the validators array is shared between all states in memory.

Can we do better?

Our main problem is how to represent short binary data (hashes, pubkeys) efficiently in Javascript. The only option to have a ~1x (data represented / memory used) is by having multiple items in a large ArrayBuffer such that the overhead of each instance is amortized.

The BeaconState has the nice property where not much data is changed at once.

Properties

Beacon state field Type bytes/idx Per-block changes Per-epoch changes
block_roots, state_roots Vector[Root, 8192] - W 1 -
eth1_data_votes List[Eth1Data, 2048] - R all, W 1 -
validators List[Validator, 1M] 121 W (16:2000) W (0:1M)
balances List[Gwei, 1M] 8 W (512:~528) W 1M
randao_mixes Vector[Bytes32, 65536] - W 1 W 1
slashings Vector[Gwei, 8192] - W (0:1) R 1M, W 1
epoch_participation List[uin8, 1M] 2 W 31250 R 1M, W 1M
inactivity_scores List[uint64, 1M] 8 - R 1M, W (0:1M)
historical_summaries List[HistoricalSummary, oo] - - W 1

Block processing

With 1M indexes

processWithdrawals           R (avg 16 max 16384) validators, W 16 balances
processProposerSlashing      W (0:16) validators
processAttesterSlashing      W (0:~500) validators
processAttestations          W 31250 participation bytes
processDeposit               W (0:~700) validators
processVoluntaryExit         W (0:16) validators
processBlsToExecutionChange  W (0:16) validators
processSyncAggregate         W 512 balances
+ hash

Min expected size diff: 35 KB

Epoch processing

processJustificationAndFinalization  R 1M participation, active_balance
processInactivityUpdates             R 1M participation + W (0:1M) inactivity 
processRewardsAndPenalties           R 1M participation, active_balance, inactivity + W 1M balances
processRegistryUpdates               R 1M validators + W (0:20) validators
processSlashings                     R 1M validators + W (0:1M) validators
processEth1DataReset                 -
processEffectiveBalanceUpdates       R 1M balances + W (0:1M) validators
processSlashingsReset                -
processRandaoMixesReset              -
processHistoricalSummariesUpdate     -
processParticipationFlagUpdates      -
processSyncCommitteeUpdates          shuffling
+ hash
+ shufflings

Min expected size diff: 9 MB

State as flat memory

Both the state data and the hashing cache can be represented as flat memory. Prysm, and Lighthouse have been using this strategy since genesis. However, a naive implementation duplicates all content between forks, limiting the total amount of states that could be hold in memory at once.

But states can be represented as base layer + diff layers

                    /<------[d4] <------[d6]
[b0] <-[d1] <-[d2] <-[d3] <-------[d5]    ^ state_b
                                   ^ state_a

Then states can be represented as the sum of:

state_a = compact([b0, d1, d2, d3, d5])
state_b = compact([b0, d1, d2, d4, d6])

Advantages

Drawbacks

philknows commented 1 week ago

This was determined to be a large time investment, and maintenence overhead and may only have memory improvements without known side effects on the CPU. Likely there will be some tradeoffs that may not be worth the risk.

twoeths commented 1 week ago

as stated here, Bun is > 5x efficient than NodeJS in term on memory allocation, should give it a try

dapplion commented 5 days ago

47 bytes / 32 byte array that's really good! Does Lodestar actually run with bun?

twoeths commented 5 days ago

Bun does not support napi for now https://github.com/oven-sh/bun/issues/158 so we cannot run beacon node with it @dapplion

we can only try with persistent-merkle-tree and benchmark the result there