Design: Optimistic Sovereign Rollups

Background

Currently, Sovereign only supports full nodes. Although we have support for zk-proof generation, our node implementation always re-executes all transactions starting from rollup genesis. As we move toward production implementations, we'll need support for faster syncing modes.

Desired Properties

Efficiency

Sync time should not depend on cumulative gas used on the rollup.
- In other words, the sync should be constant-time per block
Sync bandwidth requirements should be less than syncing a full node. In other words, nodes should not download every transaction during sync (unless the underlying DA layer requires it).

Correctness

The following assumptions may be required to ensure that a node ends up at the correct state:

The node has access to a correct view of the DA layer's consensus (i.e. it's header chain)
(ORUs only) All invalid state transition claims are challenged within some period (say, 24 hrs)
(Non-succinct DA layers only) The node has (untrusted) access to the DA chain's history

The following additional assumptions may be introduced if necessary to speed up sync in the "happy case":

Sovereign Labs (or another centralized entity) can operate untrusted but relatively available (95%+ percent uptime) centralized infrastructure to aid light clients
Most rollup nodes are connected to at least one honest peer via the p2p network.

The following assumptions are insecure and may not be relied on:

All nodes are connected to an honest peer via the rollup p2p network.
Rollup nodes know which of their peers are honest/malicious
Sovereign Labs (or any other centralized entity) is able to run trusted infrastructure and/or provide honest peer connections

Proposed Solution

Basic Design: Attestations and Challenges

Sovereign Optimistic Rollups should separate their data into two namespaces, a "syncing" namespace and a "transaction (tx)" namespace. Rollups must process all of the data from both namespaces. As in the existing Sovereign design, blobs can be posted to the tx namespace in any order - so the latest rollup state can only be computed after a DA layer block is accepted.

In addition to "sequencers", we introduce a new role "attesters". The job of an attester is to inform light clients that a state transition has occurred by posting a special Attestation blob to the "syncing" namespace. Like sequencers, attesters must register by staking tokens.

/// An attestation that a particular DA layer block transitioned the rollup state to some value
struct Attestation<StateProof> {
  initial_state_root: [u8;32],
  /// The hash of the block in which the transition occurred
  da_block_hash: [u8;32]
  /// The alleged post-state root
  post_state_root: [u8;32],
  /// A proof that the attester was bonded as of `initial_state_root`. For rollups using the `jmt`, this will be a `jmt::SparseMerkleProof`
  proof_of_bond: StateProof,
}

If an attestation is incorrect, any bonded party may challenge the attester by posting a Challenge blob. This blob is just a serialized zk-proof showing that the true result of the state transition was different from the one claimed by the attester.

struct Challenge(&'a [u8]);

pub struct StateTransition<ChainValidityCondition> {
    /// The state of the rollup before the transition
    pub initial_state_root: [u8; 32],
    /// The state of the rollup after the transition
    pub final_state_root: [u8; 32],
    /// Any additional validity condition for the state transition which needs
    /// to be checked by the client of the zkVM circuit.
    pub validity_condition: ChainValidityCondition,
}

pub struct ChallengeContents{ 
  challenger_address: [u8;32],
  state_transition: StateTransition<ChainValidityCondition>
}

pub struct ChainValidityCondition {
    pub prev_hash: [u8; 32],
    pub block_hash: [u8; 32],
}

pub fn do_challenge<Vm: Zkvm>(challenge: Challenge, attestation: Attestation) -> Result<()> {
   let challenge_contents: ChallengeContents = Vm::verify_proof_and_extract_output(challenge.0)?;
   let proof_output = challenge_contents.state_transition;
   ensure!(proof_output.initial_state_root == attestation.initial_state_root);
   ensure!(proof_output.final_state_root != attestation.post_state_root);
   ensure!(proof_output.validity_condition.check().is_ok());
   ensure!(proof_output.validity_condition.block_hash == attestation.da_block_hash);
   Ok(())
}

Sync Strategy

With that infrastructure in place, we can outline a simple sync protocol.

Assume that clients start from some known good state. This can be the genesis state, a more recent "checkpoint" which is widely known to correspond to a valid rollup state, or a state root that the client has already synced. The state identifier can be expressed as a pair of hashes (rollup_state_root, and da_block_hash).


/// Uniquely identifies the pre-state for some state transition,
#[derive(Hash, PartialEq, Eq)]
struct PreStateIdentifier { 
   pre_state_root: [u8;32],
   da_block_hash: [u8;32]
}

struct PotentialStateTree {
  first_unverified_slot_num: u64,
  /// A data structure tracking claimed state transitions for each DA block. Entries are stored as a mapping from
  /// a da_block to a mapping between rollup pre-state roots and state transitions.
  current_state: PreStateIdentifier,
  potential_next_states: Vec<Rc<PotentialState>>,
  /// A mapping from a pre-state identifier to the parent state itself
  potential_state_parent_map: HashMap<PreStateIdentifier, rc::Weak<ParentState>>
}

type ParentState = PotentialState;
type PreStateRoot = [u8;32];
type DaBlockHash = [u8;32];

struct PotentialState {
  attestation: Attestation,
  next_state: MaybeVerifiedStateTransition
}

impl PotentialState {
  fn new(attestation: Attestation) -> Self 
    Self { attestation, next_state: Default::default() } 
  } 
}

/// A potentially verified state transition.
pub enum MaybeVerifiedStateTransition {
   Unverified(Vec<Rc<PotentialState>>) // Default::default() returns Unverified(vec![])
   Verified(StateTransition<ChainValidityCondition>)
}

fn get_pre_state_id(attestation: Attestation) -> PreStateIdentifier { 
   PreStateIdentifier { 
     pre_state_root: attestation.initial_state_root,
     da_block_hash: get_header_by_hash(attestation.da_block_hash).prev_hash
   } 
}

The syncing procedure is as follows (assuming the DA header chain is already downloaded and accessible through a function header(num: u64) -> BlockHeader, and the last processed DA layer block had number current_block_num - 1):

Create a PotentialState tree initialized with the known rollup state root and current_da_hash = Header(current_block_num - 1).
(Start loop) Download the syncing namespace data for current_block_num.
For each potentially valid attestation from the syncing namespace data, make a PotentialState::new() from the attestation. If the attestation’s pre-state is the current state, add it to the potential_next_states vector. Otherwise, look up its pre-state in the potential_state_parent_map and try to add the new potential state as a next_state for that parent. If this succeeds, potential_state_parent_map.insert(potenial_state.post_state, Rc::weak(potential_state)).
1. An attestation is potentially valid if its proof_of_bond verifies against its initial_state_root and it's PreStateIdentifier is either the current_state or is present in the potential_state map.
2. Two attestations are considered to be duplicates of one another if they are both valid and they have the same PreStateIdentifier, and final_state_root (Note that the proof_of_bond may be different for two duplicate attestations!). If duplicate attestations occur, ignore all except the first one. TODO: Add a hashset for deduplication in MaybeVerifiedStateTransition.
For each challenge in the syncing namespace data…
1. If the challenge’s pre-state is the current state and the challenge is valid, update the current state to the challenge’s post state. std::mem::take the potential_next_states vector. For each entry in the vector, whose post-state doesn’t match the new state, remove it’s post-state from the potential_state_parent_map. Recursively remove each of the children of those states from the map
2. If the challenge’s pre-state is in the potential_state_parent_map and the challenge is valid, update the parent_state.next_state field to be Verified, and remove each of the old potential next_states if applicable
If the current_block is at least 24 hours newer than the block from the current_state_id, update the current_state_id to its child state.
(GOTO start loop) increment current_block_number.

Handling Incentives

The structure proposed above mostly ignores incentives. In this section, we'll flesh out the details.

2-Phase Unbonding for Attesters

To prevent spam and allow for quick sync, we need light clients to be able to throw out attestations from un-bonded entities. This is a problem, because by definition light clients are trailing behind the chain tip - so they can’t possibly know whether an attester is currently bonded. We solve this problem by (1) using a merkle proof to guarantee that an attester was bonded at the time of the state transition they’re attesting to and (2) introducing an unbonding period so that we can guarantee that by the time an attester has finished unbonding, all light clients are aware that they’ve started unbonding.

We can do that by introducing a new attester-registry module. That module will look very similar to the sequencer registry (only bonded attesters can post blobs), but, unlike the sequencer module, it will need a two-phase un-bonding workflow. When an attester initially bonds, their stake is locked in the contract and their address is added to the bonded_attesters map in state.

Define some ROLLUP_FINALITY_PERIOD corresponding to the amount of time it takes for a light client to be confident that an attested state transition won’t be challenged. Define ROLLUP_FINALITY_SLOTSto be the maximum number of DA layer slots which could occur in that period.

Suppose that an attester was bonded as of slot N. To unbond, they have to submit a begin_unbonding transaction - imagine that this transaction is included in slot N+1. During that state transition, their address will be removed from the bonded_attesters list and placed in a special unbonding_attesters map. Attesters in this map are no longer able to make attestations (i.e. light clients will ignore them), but are also unable to withdraw their funds.

Next (no sooner than block N+2) an attestation to the transition from slot N+1 will be posted on chain by some other attester. Since the rollup is already reading all of the data from the “syncing” namespace, it will see that attestation during execution of block N+2 . At that time, the rollup will update its state to allow the attester to withdraw starting any time from slot N+2+ROLLUP_FINALITY_SLOTS.

Rewarding/Slashing Attestations

As mentioned above, the rollup must process the “syncing” namespace. For any attestation in the namespace corresponding to an invalid state transition, the rollup should slash the attester immediately. Similarly, the first valid attestation to any state transition should be rewarded. The exact reward formula is left to individual rollup designers.

Rewarding Challengers

Individuals who successfully challenge an attestation should be rewarded with (some portion of) the attesters bond. The remaining portion should be burned.

Bonding Challengers

In this design, there is tension between two goals - we want to keep the set of challengers as large as possible, but we also want to be able to slash malicious challengers to prevent DOS attacks. We balance these competing goals with the following strategy - we require light clients to process all challenges (even if the challenger is not bonded), but we only reward challengers (and slash attesters) if the challenger is bonded.

Using this strategy, we can create a new invariant: for each challenge, either (1) the outcome of a challenge will never have to be proven in circuit (because the challenge doesn't result in a state change) or (2) the challenger will be slashed if the challenge is unsuccessful.

The DOS Vector

If SNARKS are so cheap to verify, why is challenging a potential DOS vector? In native execution, it isn't! But in-circuit, recursive verification can be quite expensive. So, the DOS vector is that a malicious challenger could post hundreds or thousands of bogus challenges in block X, and then post a false attestation to a state transition in block X+1. If these challenges resulted in state changes, an honest party would have to prove the outcome of each of these challenges in order to prove the true state transition, which would require a huge number of recursive evaluations.

To prevent such an attack, we just need to ensure that the such challenges can never result in state changes. If that is the case, then we won't need to recursively evaluate them in order to prove some other state transition. But, since native evaluation is cheap, we still expect light clients to process these challenges during sync.

The Honest Challenger Workflow

This rule gives us a huge gain in capital efficiency (challengers don't have to be bonded during normal execution), while still incentivizing challenges from honest parties in case of an invalid attestation. The workflow looks like this:

Block N: a malicious attester lies about the state transition from block N-5. An honest full node immediately notices the invalid attestation. (Recall that light clients have only finalized the state transition for block N-K for some relatively large value K - maybe a few hundred blocks).
Block N+1: the honest full node submits a transaction which bonds himself as a challenger
Block N+2: the honest full node submits a challenge on-chain. Light clients immediately process the challenge and prune the invalid transition from their lists of potential states. Since the honest full node is bonded, it gets rewarded and the malicious attester gets slashed as part of the state transition during block N+2.
Block N+3: another malicious attester posts a lie about transition N+2, so a challenge is needed. The challenger is forced to compute a recursive verification of the previous challenge, but this cost will be paid out of the new attester's bond.

What Happens if a Challenger is Malicious

If the challenger is malicious, we have two possible cases:

Case 1: The challenger does not bond

Block N: an attestation for the transition during block N-5 is posted on-chain.
Block N+2: a malicious full node submits an invalid challenge on-chain. Since the challenge is malicious, it doesn't bond itself first. Light clients process the challenge and see that the proof is forged, so they don't update their view of potential states.
Block N+3: a challenge depending on the outcome of slot N+2 is needed. Since the malicious full node was not bonded, its challenge cannot cause a state change, so we can skip recursive verification

Case 2: The challenger bonds

Block N: an attestation for the transition during block N-5 is posted on-chain.
Block N+1: a malicious full node submits a transaction which bonds himself as a challenger
Block N+2: the malicious full node submits an invalid challenge on-chain. Light clients process the challenge and see that the proof is forged, so they don't update their view of potential states. Full nodes see that the challenge is invalid and slash the challenger's bond.
Block N+3: another malicious attester posts a lie about transition N+2, so a challenge is needed. The challenger is forced to compute a recursive verification of the previous challenge, but this cost will be paid out of the new attester's bond.

Sovereign-Labs / sovereign-sdk

Design: Optimistic Light Clients #419