ethereum / portal-network-specs

Official repository for specifications for the Portal Network
303 stars 82 forks source link

Schemes for storing the state #194

Open pipermerriam opened 1 year ago

pipermerriam commented 1 year ago

Related to #193

At present, I believe we have two "known" schemes for handling state.

  1. "flat" - contiguous order preserving mapping of the state trie onto the DHT such that individual nodes store and maintain a slice of the trie. This gives O(1) access to accessing the state and efficiency gains to storage since it eliminates large amount of intermediate nodes in the accompanying proof. In order to serve requests for the most recent state roots, a node will either need to keep a copy for each state root or employ some kind of de-duplication scheme like storing the most recent state and a series of reverse diffs to allow rewinding the latest state to one of the recent historical state roots. This approach doesn't easily scale to reach arbitrarily deep into the state. The further away from the latest root you get into the history, the further you have to rewind giving you an O(N) computational issue. This approach is good at serving recent state. This approach fails when accessing sparse contract storage trying to access values that aren't in the trie. Because of the sparse nature of contract storage under the hexary MPT, exclusion proofs become problematic.

  2. "naive" - store every trie node. This scheme is "simple" and comes with a number of inherent inneficiencies. Primarily, state access has a O(log(log(n)) lookup cost due to having to do trie traversal. This scheme is good at archival state. This scheme is good at both account access and contract storage access for both existing values and missing values that require exclusion proofs. This approach has a subtle problem with ongoing "gossip" of state data due to individual trie nodes not being firmly anchored to any individual state root, and thus, we likely don't want to store trie nodes with accompanying proofs because a proof towards a very old state root may not be as compelling as a proof against a recent state root. This scheme also has some complexity around garbage collection (assuming we want garbage collection which we may not).

These two schemes when combined make up the best solution thus far for how to do state within the Portal Network under the hexary MPT.

There is a 3rd scheme that I believe we should explore, which is some version of what Erigon did to reduce archival state by roughly 10x. My understanding of their scheme is that they store the flattened series of events that modify accounts and contract storage with the relevant indexes such that they can look things up based on block height. My brain thinks there's a way to map this scheme onto the portal network, but I don't have it figured out. I don't think we have to figure this out to achieve archive state in portal, but there's a chance that it reduces overall storage requirements by 10x if we can figure it out.

perama-v commented 1 year ago

Here are some statistics on the shape of the Merkle Patricia Trie, from a sample following the chain tip on mainnet late 2022 with a custom geth tracer for about 10k blocks: https://hackmd.io/@jsign/geth-mpt-analysis

That state tries being 8 or 9 deep is surprising. As each new depth requires possession of 15 sibling nodes. Just additional info for the discussion, no take home immediately obvious to me.

perama-v commented 1 year ago

Good resource for understanding a flat representation: Besu bonsai trees https://www.youtube.com/watch?v=JNCgRUpN1p4

pipermerriam commented 5 months ago

This document is really good for understanding the Erigon format: https://github.com/ledgerwatch/erigon/blob/main/docs/programmers_guide/db_walkthrough.MD