ethereum / portal-network-specs

Official repository for specifications for the Portal Network
316 stars 85 forks source link

Archival state overlay #193

Open perama-v opened 1 year ago

perama-v commented 1 year ago

Description

The portal network does not provide archive node functionality. Could it?

Motivation

Exploration of new user types for the Portal network.

Currently a portal network node could be of use to someone wanting full-node functionality post-EIP4444 (history expiry). That includes access to historical transactions and receipts. Yet this is an incomplete representation of the transaction.

At present there are popular websites that provide transaction data. Block explorers and individual transaction explorers such as: https://ethtx.info/mainnet/0x1737d3cb7407b4fbf17e152e2d95cabca691e5fdcc8ae67a48c1a11d4809fe78/

What if instead of seeking such websites, a user could have a similar interface that sources data from their portal node? Ultimately, deep transaction introspection of isolated transactions has proven to be a popular use case. It provides a critical function that the community requires: what happened / what went wrong. As this functionality is highly desired, it could be a major reason people run Portal nodes.

Preliminary research

Providing archive access likely involves an inefficient representation of data as compared to a normal archive node. Yet this is what the portal network is designed for: to share such a load among many peers.

https://perama-v.github.io/ethereum/protocol/archive

The portal network already has block header accumulators. So blocks can be packaged along with the state that they require to be traced. They can then be distributed in the network trustlessly. Transactions inside them can then be replayed and traced by replaying the block.

carver commented 1 year ago

I love the use case of running transaction traces!

I'm not sure that storing the state witness for every block is the best option we have for enabling that. It sounds like it means storing duplicates of the same state, every block that it gets accessed. That might be a lot of duplicate data.

Another option is to "simply" extend the state network to be a full archive of all state from genesis.

perama-v commented 1 year ago

I like to think about a theoretical transaction with index 100 in some historical block N:

---
title: Tracing transaction index 100 in block N requires preceeding intra-block transactions.
---

flowchart RL
C --> B --> A
subgraph A[block N - 1]
    direction BT
    subgraph aB1[Header]
        direction RL
        asr[State root]
    end
    subgraph aT1[Transaction 0]
        direction TB        
    end
    subgraph aTx[...]
        direction TB
    end
    subgraph aT22[Transaction 22]
        direction TB
        at99con1s1[update contract 1, storage slot 1]

    end
    subgraph aT23[Transaction 23]
    direction TB
    at99con1s2[update contract 1, storage slot 2]
end
    aT23 --> aT22 --> aTx --> aT1 --> asr
end

subgraph B[block N]
    direction BT
    subgraph B1[Header]
        direction RL
        bsr[State root]
    end
    subgraph T1[Transaction 0]
        direction TB
        t0con1s1[update contract 1, storage slot 1]
    end
    subgraph Tx[...]
        direction TB
    end
    subgraph T99[Transaction 99]
        direction TB
        t99con1s1[update contract 1, storage slot 1]        
    end
    subgraph T100[Transaction 100]
        direction TB
        t100con1s1[read contract 1, storage slot 1]
    end
    T100 --> T99 --> Tx --> T1 --> bsr
end

subgraph C[block N + 1]
    subgraph C1[Header]
        direction RL
        csr[State root]
    end
end

subgraph P[proof containing every state block N accesses]
    direction TB
    s1[contract 1, storage slot 1 value, before transaction 0]    
end
P -.- B

Transaction 100 could read from a slot that was updated by any preceding transaction in that block. So every transaction (0-99) must be replayed. If you do not provide the state proof for every part of state that is accessed in that block, then you cannot be sure that the value is correct.

The state proof effectively is the database for tracing that block. It does not need to have storage slot values that were not accessed in that block. So when constructing the proof you replay a block, look for any state that was actually touched by the block and construct the proof containing all those values.

For example, contract 1 slot 2 is updated in block N - 1. So it will be part of the state root of block N, but it does not need to be in the block state proof for that block because it is not accessed.

pipermerriam commented 1 year ago

I really like going directly for archive. It fits really well into the things our protocols do best, aka, sharing the load and getting more powerful when more machines join the network, and it could serve as a good adoption mechanism to engage the community to provide the necessary storage to house it.

perama-v commented 1 year ago

I started making a basic tool to build a state proof test the idea. Nodes have some convenient methods that can be leveraged.

The two of interest are

(WIP)

https://github.com/perama-v/archors

pipermerriam commented 1 year ago

I'm inclined to go for storing all archival state over storing transaction traces, because the archival state can be used to produce transaction traces, but is also useful beyond that for other historical EVM research.

We have an existing approach to archival state, which would be to do full storage of all historical trie data. This approach is functional, in that it allows for accessing arbitrary historical state roots, but it suffers from the inefficiencies that go with having to do merkle trie navigation to access state. I think that this is our "default" starting point for network design since we know it will work.

I also suspect that there's alternate schemes for doing this that could greatly reduce overall storage requirements and reduce the overhead for individual state lookups, but that's an open area of research that would need some :brain: work.

ScottyPoi commented 1 year ago

I really like going directly for archive. It fits really well into the things our protocols do best, aka, sharing the load and getting more powerful when more machines join the network, and it could serve as a good adoption mechanism to engage the community to provide the necessary storage to house it.

Me too -- I had assumed that full archive state was the goal, and initially approached research in that way. That goal still feels essential.

Im still speculating about a lot of stuff but, the disc space saved by storing merkle tries feels overwhelmingly worth the work to rebuild a proof if asked.

I think if we're clever about the way we use the concept of "radius", nodes could be flexible not just about the range of account addresses that it covers, but also the depth of state history that it is willing/capable of serving.

I wonder also if there is a way for nodes with overlapping radius to self organize, so that the history of accounts in their shared radius is also distributed.

Patricia tries can be weird -- but generally two Tries with the same number of leaves will be smaller than one Trie with twice as many leaves. Which in this context would mean it is easier on disc space to expand a radius across time than it would be to expand a radius to include more addresses.

perama-v commented 1 year ago

A summary so far

Terminology

"Tracer Configurations": The set of 4byteTracer, callTracer, diffMode and prestateTracer.

"Basic Archive Traces" (BATs): For any historical block, a Portal node can respond to:

"Counterfactual Archive Traces" (CATs): For any historical block, a Portal node can respond to:

"Full Node History" (FNH): For any historical block, a Portal node can respond to:

"Full Node State" (FNS): For any block less than 256 blocks old, a Portal node can respond to:

"Archive Node State" (ANS): FNS, but for any historical block.

"Accessed State Proof" (ASP): A proof against the block state root that includes every piece of state that a block accesses during execution.

What the Portal Network could target

Currently specified overlay networks and their functionality:

Functionality not yet incorporated into an overlay network:

How to deliver traces

Transactions do not store post-state hashes. Therefore all tracing solutions require the execution of all preceding transactions using values proven against the block state root.

BATs and CATs via ANS: Broad scope, small size but high latency

Basic Archive Traces (BATs) and Counterfactual Archive Traces (CATs) via Archive Node State (ANS).

Architecture: Extend the state overlay to blocks more than 256 blocks old. This provides Full Node State (FNS) back to genesis.

To serve BAT/CAT methods a user sends the request to their localhost node:

  1. Node obtains the block from the History Overlay Network (FNH)
  2. Node starts executing transactions (0, 1, ... 99)
  3. For each transaction the node calls the state overlay network (ANS) when state is required: a. make request: mainly eth_getBalance/eth_getCode/eth_getStorageAt b. continue to run transaction c. if untouched state is required go to a)
  4. Stop when transaction of interest has been executed

Challenges:

Benefits:

Implied storage cost:

BATs (no CATs) via ASPs: Narrow scope, large size but low latency

Basic Archive Traces (BATs) via Accessed State Proofs (ASPs) that contain only values that are required to execute a single block in isolation.

Architecture: Construct proofs for blocks once they are created. Each proof contains state that the block needed when it was created. Any state not accessed by the block is not included in the proof. Proofs are distributed on a new overlay network and are requested for local tracing.

To serve BAT methods a user sends the request to their localhost node:

  1. Node obtains the block from the History Overlay Network (FNH)
  2. Node obtains the proof (ASP) from the appropriate (new) overlay network.
  3. Node starts executing transactions (0, 1, ... 99)
  4. For each transaction the node reads from the state proof when state is required.
  5. Stop when transaction of interest has been executed

Challenges:

Benefits:

Implied storage cost:

Summary

If "what happened" / Basic Archive Traces are desired they can be achieved by:

The BATs/CATs via ANS is the favoured approach.

Current research questions: Think about optimisations or clever strategies to make the implementation fast and efficient

pipermerriam commented 1 year ago

First, this analysis seems solid and I appreciate you writing it out and getting it organized. I believe your analysis is all correct.

The main thing I'll say is that these solutions aren't mutually exclusive and that they can/could be built as layers.

If we build ANS (Archive Node State), then ASP (Accessed State Proofs) is simply a layer on top of ANS. In theory, we can build both. The only limits are how much capacity the network is able to acquire through people running nodes. At the current stage of this project, we only have the luxury of building one and thus, if we focus on one of these, it'll be an ANS solution. After that is live and working, the race is on for people to build layers on top of it that make it faster and easier to use.

perama-v commented 1 year ago

ASP extensions and optimisation notes.

Previously a historical state format was introduced via Accessed State Proofs. This post expands on the idea to mitigate some previously identified issues.

This continues to be a research discussion, not intended as a formal spec direction change.

Post-verkle-vision

First a note on the potential for a sudden drop in database requirements for a state model predicated on proofs.

Pre-verkle proofs, we use Merkle proofs. The data storage burden is heavy, but we provide all historical state.

Post-verkle proofs every Merkle proof stored can be replaced in-situ with verkle proofs and suddenly nodes have 90-95% reduction in storage for the proof part of the database.

Enable ANS and CATs via ASP by adding new ASI: Feasible but slow

Archive Node State (ANS) and Counterfactual Archive Traces (CATs) via ASP (Accessed State Proofs) plus new ASI (Accessed State Index)

"Accessed State Index" (ASI): A mapping of state keys to block numbers that they were historically accessed in. E.g,. storage slot key -> array of block numbers.

This index allows discovery of which proof-holding-peers (ASPs) have the state data for a given slot.

Accesses State Proofs (ASP) give access to historical state that was used in that block but neglect all other state. This is a problem because you cannot answer arbirtrary state questions: eth_call/eth_getStorageAt at that block.

A new index can be introduce that stores when each slot was accessed.

  1. Start executing (eth_getStorageAt / eth_call) at old block height (e.g,. block 15_000_000)
  2. Encounter some slot that was not accessed during the block execution
  3. Check the slot access history to get a list of blocks that touched that slot
  4. For all blocks prior to the block of interest, find the newest block
  5. Get the slot value from that block.
  6. Continue execution. Go to 2.

This index (slot access history) means that arbitrary traces are possible. They require two network requests per novel slot (one to get slot history, one to get slot value).

An overlay network distribute the slot access history index.

  1. User node has an ID.
  2. Storage slot key (0x1122...3344) has a distance. If distance is within a radius, the node keeps the index data.
  3. The index data is a record of block heights (1_234_000, 1_345_000, 1_567_000, 1_930_000, ...)
  4. The node can answer questions like: "When was this slot key (0x1122...3344) last accessed, prior to a certain block (1_500_000)"

Or, as a network message that is passed toward nodes with IDs within the radius of 0x1122...:

curl -s -H "Content-Type: application/json" -d '{"method":"portal_stateSlotAccessPriorTo","params":["0x1122...3344", "1500000"],"id":1,"jsonrpc":"2.0"}' http://localhost:8545

Result (where 0x1485e8 is 1345000):

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": "0x1485e8" 
}

Then that response can be used to with eth_getStorageAt:

curl -s -H "Content-Type: application/json" -d '{"method":"eth_getStorageAt","params":["0x1122...3344", "1345000"],"id":1,"jsonrpc":"2.0"}' http://localhost:8545

Result (where the slot value was 1 when it was last accessed prior to the block in question, and since it has not changed )

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": "0x0000...0001" 
}

ASI format

A node would be responsible for storing a subset of the Accessed State Index. That is, only some state keys. For each key they record a binary accessed/not-accessed for every block. This can be represented in the simple case as an uncompressed bitmap, which for each key would be 17_000_000 / 32 = 500K 32-bit integers = 5MB per key. From this you could look at how many state keys there are on mainnet and calculate the worst-case upper bound for this index.

Speed: BATs O(1), CATs & ANS O(n)

Fast tracing (BATs)

eth_debugTrace* - nodes store all the data necessary for specific blocks, so tracing is fast once an appropriate node is found.

Fast simple arbitrary state (CAT/ANS)

eth_getStorageAt - Getting individual historical arbitrary slots is quick: Two network requests.

simple debug_traceCall / eth_call: Basic tasks like getting a token balance or ENS resolver value at a specific height is fast.

Slow complex arbitrary state (CAT/ANS)

complex debug_traceCall / eth_call: Tracing counterfactual transactions is slower. Each new state requires two lookups so the time taken is O(n). This is sequential, so a trace involving 100 state lookups would require 200 network requests.

Database side

Suppose a node has 5% of the total database size. Then it will have ~1-in-20 blocks locally.

Leaf deduplication

Deduplication of common leaves: Storage of code by code hash to avoid storing an unchanged contract code that is read (accessed) in different blocks.

Proof subtree deduplication

A statistical analysis revealed that state trees were about ~8 deep. This is 8 levels each with 15 siblings. Blocks have multiple state accessess so some of these siblings would lead to leaves, but many would not.

There are likely to be large subtrees that overlap. These would be sets of siblings nodes that were unchanged between sequential blocks (but non-contiguous, e.g., 15_000_000 and 15_000_0020) for that node. These sets of siblings and the subtrees they form could be stored efficiently by searching the trees for such patterns.

A simple example: Where set of 15 siblings is unchanged between two sequential proofs, replace the 15 nodes with a hash of those nodes and only store them once.

Conclusion

An additional Accessed State Index (ASI) allows a node that has the block state proof for block N to answer state queries for slots accessed in that block - for all blocks moving forward until that slot is accessed again x blocks later. That is, the respond to state queries for blocks N + x, coordinated by a distributed index.

This allows for fast access ~O(1) to archive state in the simple case and O(n) access in complex counterfactual state queries.

perama-v commented 1 year ago

This post is explores the DoS potential of the ASP (Accessed State Proof) model for archive node state (ANS)

First, a review of which part of the state block proofs provide. Proofs can only be constructed if you have access to the full state trie. So, they are not suited to being the mechanism for the chain tip, where a new block can touch any part of state.

One design is to leave the state overlay, which providies Full Node State (FNS) up to blocks 256 deep, as is. A separate Archive State overlay can then be responsible for providing state for blocks 256 and onwards. This gives nodes the time to produce and check proofs and construct accumulator sealing valid proofs for rapid verification from then onwards.

---
title: Blocks older than 256 become the domain of the Archive State Overlay
---

flowchart TB
C -- Update state--> B -- after 256 blocks--> A
subgraph A[Archive Node State ANS via Accessed State Proofs ASP]
    direction BT
    subgraph aACC[Accumulator]
        direction TB
        asr[State root]
        abh[Block hash]
        aph[hash of accessed state proof ASP]
    end
    subgraph aASP[Accessed State Proofs ASPs]
        direction TB  
        Proof
        ap1bh[Block hash]      
    end
    subgraph aASI[Accessed state index ASI]
        direction TB
        aasiacc[When each slot was accessed]
    end
end

subgraph B[Full Node State FNS via distributed trie design]
    direction BT
    accounts
    storage
    contracts
end

C[Bridge injects new blocks created by EL/CL]

N1[state overlay nodes store particular accounts, keys, contracts]
N2[archive state overlay nodes store proofs for particular blocks]
N3[archive state overlay nodes store appearances for particular state keys]
N4[Proofs are constructed by tracing a block, aggregating values, calling eth_getProof ]

N1 -.- B
N4 -.- A
A -.- N2 & N3

Nodes constructing proofs trace recent blocks. They could use either the State and History overlays to do this, or they could use a bridge connected to non-Portal node that implements eth_getProof.

Bad-but-valid ASPs

Invalid proofs are expensive to create but easy to check. Once an ASP is deemed valid it may still be "bad" with respect to the block they are paired with by containing:

The block must be executed to differentiate the bad-but-valid proofs from the ideal proof: one that only contains the requisite state.

Colocation of data

A block header is required to know that the ASP root is correct.

A block header (gas price, block number) and a block body (e.g., list of transactions) is required to execute the block and check that a valid ASP truly contains all the required state and not more.

Thus, on the state overlay network, validity of a proof requires access to the history network (they are not completely isolated). This can be addressed through an accumulator of block states (below), but it is also true that the state data is reliant on the history data practically too. A request for a transaction trace involves both state and history networks. So separation goals are somewhat artificial in this regard.

It would be ideal if the data that is requested of nodes, happens to also be colocated on the same nodes. This would minimise network lookups (the most sensitive part of the Portal Network).

For example, the content id of ASPs could be designed to match that of block bodes, so nodes by default have the majority of the data for a single block at hand.

Here is an example system that achieves this goal, where ASP content key is defined by the block hash.

Name Network Key ID Node responsible Size
Header History 00 + block_hash 0xabc...123 A "~0xabc" small
Body History 01 + block_hash 0xdef...789 B "~0xdef" medium
ASP Archive State 01 + block_hash 0xdef...789 B "~0xdef" large

Any ASP with that enters the state network will be directed to node B. That node makes a single request to get the block header. At this point, the ASP proof can be checked and then the block replayed to check the proof contains the minimum-viable state.

State accumulator design

Following the verification that an ASP is valid and ideal (not insufficient/bloated) it could be canonicalized in an accumulator to enable the state network to decouple from the history network. That is, once the state at the chain tip is checked (requires history network), the state network could become self sufficient.

An accumulator containing:

By embedding the hash of the ASP a new node entering the network will have the following capacity: To received an ASP for an old block, to check that the ASP is correct and that the proof was used to execute the block and it was shown to contain the necessary information. In this way, the node may participiate in the State overlay without compromising on guarantees about what data they are storing.

Include block hash in ASP content definition

To satisfy the optimisation "nodes that check proofs should already have most data locally", the following content design may be useful.

Main property: define ASPs by block hash, rather than by state root.

accessed_state_proof = TBD # Merkle proof in SSZ format.
block_hash = Bytes32 

# Keep the block hash with the proof so that its key may be readily derived
asp_content = Container(
    block_hash: block_hash,
    proof: accessed_state_proof
)
asp_selector = 0x01 # Matches the selector in the History network.
asp_content_key = asp_selector + block_hash
asp_content_id = keccak(asp_content_key) 

If instead the ASP content was defined by the state root, then when a new ASP enters the network it will be directed to a node that does not have block bodies.

All new ASPs for a particular block will be directed to the same nodes, they will check the proofs, and if correct and ideal, witll extend the state accumulator (defined below). They then send the latest accumulator to the rest of the network.

State accumulator definition

The accumulator records block header data and a hash of the ASP.

accessed_state_proof_hash = keccak(accessed_state_proof)

# An individual record for a historical state root.
state_root_record = Container(
    state_root: Bytes32, # from block header
    block_hash: Bytes32, # from block header    
    accessed_state_proof_hash: accessed_state_proof_hash    
)

With accumulator structure and algorithm following the same protocol as the header accumulator in the History network.

EPOCH_SIZE = 8192 # blocks
MAX_HISTORICAL_EPOCHS = 131072  # 2**17

# The records of the state roots from within a single epoch
state_epoch_accumulator = List[state_root_record, max_length=EPOCH_SIZE]

state_master_accumulator = Container(
    historical_epochs: List[bytes32, max_length=MAX_HISTORICAL_EPOCHS],
    current_epoch: state_epoch_accumulator,
)

state_epoch_hash = hash_tree_root(state_epoch_accumulator)

state_accumulator_content = SSZ.serialize(state_epoch_accumulator)
state_accumulator_selector = 0x00
state_accumulator_content_key = state_accumulator_selector + state_epoch_hash
state_accumulator_content_id = keccak(state_accumulator_content_key)

DoS potential

Spam Proofs

With the above design, a malicious node can produce bad proofs at the chain tip. They construct proofs that have valid state, but insufficient/bloated values with respect to the block.

Defense involves replaying that block using the proof data. Then disseminating an accumulator so that in the future this process does not have to be repeated.

An attacker could produce bloated/insufficient proofs at the front of the accumulator. To decide which accumulator value is correct, a node would have to replay the block. A bloated proof can be pruned down to an optimal proof. An insufficient proof can be rejected.

A flood of insufficient proofs represent a spam attack cheap to generate and costly to verify. A node will have to execute the block to reject this. If the time check all the spam proofs for a given block exceeds the block time (12s), then the node will not keep up with proof production.

Blocks are full at 30MGas, which is calibrated to limit state growth. EVM engines are capable of much higher speeds, approximately 10-100 blocks/second. Suppose a worst case of 1 block/second. An EVM could process ~10 blocks within 12 seconds. Hence, one mitigation is a rule where if a node is presented with more than 10 proofs per block height they can block the offending peer.

Importantly, executing the block is EVM-speed because the node has the data locally.

Huge network transefers

Some Accessed State Proofs may touch so much state that they are very large - perhaps >100MB at a guess. This is dependent on the content of the blocks being produced, which may be for a myriad of reasons.

Fundamentally, if such a block is to be replayed, this state must be available. If the data model is ASP, then a request for the state of a particular block could invoke a ~100MB transfer.

Alternatively, if the state is distributed (ANS via FNS, without accessed state proofs), then the data will have to be retrieved from many peers, and likely represents a block that is very difficult to trace at all. For example, executing a small part of the block, then fetching state, then more execution and then more fetching, likely hundreds of times sequentially. Each request being for a proof for a single piece of state. In practice, this results in the same order of data transfer (~100MB), but split over a long period.

If it is the intention to be able to trace all blocks, then getting everything required for a block in a single request seems good - and for state-heavy blocks, this could mean the difference between succeeding or not-succeeding with the trace in a timely manner.

Summary