Open dapplion opened 1 year ago
This was determined to be a large time investment, and maintenence overhead and may only have memory improvements without known side effects on the CPU. Likely there will be some tradeoffs that may not be worth the risk.
as stated here, Bun is > 5x efficient than NodeJS in term on memory allocation, should give it a try
47 bytes / 32 byte array that's really good! Does Lodestar actually run with bun?
Bun does not support napi for now https://github.com/oven-sh/bun/issues/158 so we cannot run beacon node with it @dapplion
we can only try with persistent-merkle-tree
and benchmark the result there
Problem description
Lodestar is relatively inefficient when representing sparse data, such a persistent merkle tree:
Summary of state size by index count
Why is the CachedState so heavy?
The memory cost of basic building blocks of CachedBeaconState using
@chainsafe/persistent-merkle-tree
isAt large index count, those numbers add up fast. Using mainnet_6963647 with 852585 indexes (note that the branch nodes count of N leaves is N)
Breakdown of main contributors to BeaconState Tree memory
Breakdown of main contributors to BeaconStateCache memory
Note that the BeaconStateCache memory cost is payed once per application, and most of the validators array is shared between all states in memory.
Can we do better?
Our main problem is how to represent short binary data (hashes, pubkeys) efficiently in Javascript. The only option to have a ~1x (data represented / memory used) is by having multiple items in a large ArrayBuffer such that the overhead of each instance is amortized.
The BeaconState has the nice property where not much data is changed at once.
Properties
Block processing
With 1M indexes
Min expected size diff: 35 KB
Epoch processing
Min expected size diff: 9 MB
State as flat memory
Both the state data and the hashing cache can be represented as flat memory. Prysm, and Lighthouse have been using this strategy since genesis. However, a naive implementation duplicates all content between forks, limiting the total amount of states that could be hold in memory at once.
But states can be represented as base layer + diff layers
Then states can be represented as the sum of:
Advantages
Drawbacks