Light client proposal - Githubissues

See for background: https://github.com/ethereum/eth2.0-specs/issues/403

Protocol changes

An activation or exit triggered in slot n now happens at slot (n - n % EPOCH_LENGTH - 1) + EXIT_ENTRY_DELAY (so always pegged to an epoch boundary)
There is a new entry in the BeaconState, latest_active_index_roots: List[uint64], a list of merkle_tree(get_active_validator_indices(state, block.slot)) of the last 8192 epoch boundaries, maintained similarly to the randao roots (if the indices are not a power of two, we zero-pad the list up to a power of two to make it Merkelizable)
There is a new entry in the BeaconState, latest_active_validator_set_lengths: List[uint64], a list of len(get_active_validator_set_lengths(state, block.slot)) of the last 8192 epochs, maintained similarly to the randao roots
To compute the seed used in get_shuffling, use seed = sha3(get_randao_mix(slot), get_active_index_root(slot)) where get_active_index_root is defined similarly to get_randao_mix
We remove all logic related to the validator_set_delta_hash_chain

Version 1b: instead of merkle_tree, use tree_hash_root. The tree hash root has hashed-in metadata about the length of the list, so latest_active_validator_set_lengths becomes superfluous and can be removed.

Light client procedure

Suppose that a light client already authenticated the chain up to some KNOWN_BLOCK, that it considers final. As per the requirements of persistent committees, this means that the light client has the information that it needs to predict persistent committees up to ~9 days in the future (see https://github.com/ethereum/eth2.0-specs/issues/375). The light client will ask for a LightClientSkipProof object for a future block up to ~9 days in the future, which contains the following objects:

The header of the proposed block
128 validator_index_proofs (Merkle branches + leaves)
128 validator_record_proofs (Merkle branches + leaves)
128 validator_balance_proofs (Merkle branches + leaves)
N-128 proofs of possession
A participation_bitfield 16 bytes long
An aggregate_signature

The light client privately chooses a shard in [0...SHARD_COUNT-1]. The light client will use the randao mixes, active index roots and active validator set lengths that it can derive from the state root of KNOWN_BLOCK to derive enough information to compute:

prev_committee_compute_slot = header.slot - header.slot % RESHUFFLE_PERIOD - RESHUFFLE_PERIOD and post_shuffling_compute_slot = prev_committee_compute_slot + RESHUFFLE_PERIOD
prev_committee = split(get_shuffling(seed, validators, prev_committee_compute_slot), SHARD_COUNT)[shard]
post_committee = split(get_shuffling(seed, validators, post_committee_compute_slot), SHARD_COUNT)[shard]
committee = [index for index in prev_committee if hash(get_randao_mix_at_epoch(post_committee_compute_slot) + bytes8(index)) % RESHUFFLE_PERIOD > header.slot % RESHUFFLE_PERIOD] + [index for index in post_committee if hash(get_randao_mix_at_epoch(post_committee_compute_slot) + bytes8(index)) % RESHUFFLE_PERIOD < header.slot % RESHUFFLE_PERIOD] (this is the actual committee for the shard)
focus_committee = committee[:128]

Number-theoretic shuffles would let us compute all of this quickly. We

For each i in range(128), it will check:

That the validator_index_proofs are valid Merkle branches for each index in focus_committee, and prove committee_indices = [active_validator_indices[index] for index in focus_committee]
That the validator_record_proofs are valid Merkle branches for each index in committee_indices and prove the validator records for each validator
That the validator_balance_proofs are valid Merkle branches for each index in committee_indices and prove the validator balances for each validator

It will also check the validity of the proofs of possession. Finally, it will compute a group_pubkey from the subset of the validators specified in the participation_bitfield plus each pubkey in the proofs of possession, and BLSVerify(pubkey=group_pubkey, message=hash_tree_root(header), signature=aggregate_signature, domain=PERSISTENT_COMMITTEE_DOMAIN). It also checks that the 1 bits in the participation_bitfield correspond to at least 2/3 of the total balance in the 128 validators selected.

If all checks pass, header becomes the new KNOWN_BLOCK.

Explanation

The idea is simple: use the client's existing knowledge of KNOWN_BLOCK to compute the committee that will be valid in some block up to 9 days in the future, then ask the network to provide Merkle branches proving the indices, and then the pubkeys and the balances of that committee. This gives it the information it needs to compute and verify an aggregate signature.

Light clients can skip ahead as many blocks at a time as they please, either one block at a time or the maximum of 9 days at at time. Note that if a light client is skipping less than 9 days at a time, it may not need new Merkle branches as the committee is the same as before, and so the branches can be omitted, and only a header, bitfield and aggregate signature can be submitted.

Rationale

Why commit to Merkle roots of the active validator set at each epoch? Because otherwise it would require a full O(N) pass through the validator set to compute.
Note: an earlier version of the "commit to active index roots" idea instead had an insert/remove-in-place mechanism with a stateful active index list which reduced the complexity of a single activation/exit to O(1); I elected NOT to use this because (i) it turns out this requires a lot of complexity and state to implement, and (ii) we are already re-calculating whole trees of the same size every epoch in the balance update phase. If we want to reduce Merkle tree re-hashing a little bit, we can require the active validator indices to be in order sorted by activation slot, which would mean that activations, though not exits, would require only negligible recalculation.
Why only check the first 128 validator indices, and accept proofs of possession for all the others? To avoid having to download up to 4096 Merkle branches.
We add the active index root into the shuffling seed to make sure that it is not possible to affect the shuffling in predictable ways by entering or exiting the validator set (there is a Fiat-Shamir-heuristic-like argument here: even if the attacker can "choose" the validator set, the protocol randomly chooses the shuffling after they do that)

Complexity analysis

Header, signature, bitfield: trivial (< 1 kb)
Proofs of possession: 0 (best case), 3968 (worst case). 3968 * 144 = 571392 bytes
Merkle branches: 3 trees 128 branches per tree max 22 levels per tree * 32 bytes per level = 270336 bytes naively, but compressible by ~1/4 via Merkle branch merging
Validator records: 216 * 128 = 26112 bytes

So 250 to 900 kB in total per ~9 days. Compare Bitcoin block headers: 80 bytes per 10 minutes ~= 100 kB per 9 days, or eth1 block headers: 500 bytes per 15 seconds ~= 26 MB per 9 days.

Note also that proofs can be compressed via a SNARK or STARK.

The above proposal produces a highly predicatable amount of traffic, which is a nice property, but one advantage of the previous approach (based on older ideas of Vitalik presented here) was that the required traffic adapts to the rate of change in the active validator set. If the rate of change is low, the skips can be much longer than 9 days.

In the last few days, I've been looking for new ways to improve the previous scheme and here I'll outline a new algorithm that appears to be very efficient.

Before we start, let's make some observations:

1. If you know the active validator set, you don't need to sync past blocks at all.

A light client is interested only in latest finalized state. Everything else in the state of the beacon chain or the shards can be authenticated through merkle proofs obtained from a LES-like protocol. Thus, if a light client knows the active validator set, it can just listen for attestations and block proposals coming from known validators. With every "vote" seen on the network, the probabilistic assurance of the light client that a certain state is finalized will increase. When the network finalizes a new state, the light client will do the same. Furthermore, the client may be immediately presented with a set of signatures from a super-majority of the known validators that directly authenticates the latest finalized state (more on this later). This would be an instant sync procedure.

On the other hand, if the the light client doesn't know the active validator set, it cannot make any judgments regarding the observed traffic. Thus, the problem of syncing with the network after being offline for a while boils down to syncing the changes in the active validator set.

2. If a super-majority of the known validators have signed any datum in the consensus objects, that's enough for the light client to trust it.

This comes from the economic reasoning used in Casper. Whatever the protocol changes are, we can just adapt the slashing conditions to give assurance to the light client that if a super-majority of the validators have signed something, this is either a finalized consensus or a lot of purchasing power is being destroyed.

I'll use this argument to make the claim that if the light client is presented with a plausible history about how the validator set changed, it's enough to authenticate the end result that is signed by a super-majority of the validators and the history itself may be transmitted or compressed in arbitrary ways. Thus, the syncing of the validator set changes is greatly simplified if the consensus objects somehow include signatures of all validators authenticating the validator set's contents at a particular time.

Proposed solution

A key technical insight for this new proposal is that the BLS signature aggregation allows us not only to have multiple signers, but also to have multiple signed messages without increasing the signature size (the cost is only increased signing time; the verification time stays the same).

This proposal is a high-level description of how the light client sync would work. If there is interest, I can work on the concrete details or I can leave this to the EF research team. In Nimbus, we hope to implement proof-of-concept implementations of all viable light client proposals in the near future in order to compare the practical results of using them.

Suggested changes:

Using the key insight above, change the signatures in all block proposals and all attestations to also include a signature over the root merkle hash of the validator set in the latest finalized state.

The goal here is to accumulate signatures authenticating the contents of the validator set. If the light client knowns a certain finalized validator set VS_prev, it can be presented with a sync proof that the set changed to VS_next. The proof includes the following information:

A minimal list of exits (list of indices).
A minimal list of new deposits (list of <pub key, index> pairs)
A BLS signature that includes at least a super-majority of known validators at VS_prev, signing VS_next.
A bitmask (or a list of signature counts per validator if aggregating just one signature per validator proves to be difficult). Please note that this list will also refer to new validators specified in (2).
A minimal list of transient validators whose signatures got aggregated in (3). A transient validator is one who made a deposit and an exit between VS_prev and VS_next. This list is optional and it will be required only if it proves hard to produce an aggregated signature from the minimal set of required validators (more on this below).
A list of other data that ended up in the BLS signature (this should encode a tree effectively. @AlexeyAkhunov's encoding scheme comes to mind). The client doesn't need to authenticate this data, but it's needed in order to verify the aggregated signature. There is a trade-off here - we can optimize the traffic needed for light clients by storing the validator set signatures separately in the attestations or block proposals, which will increase the network traffic a bit for everyone else.

Please note that the distance between VS_prev and VS_next can be quite significant. Such a proof needs to be generated roughly when 1/3 of the validators from VS_prev are no longer active. Furthermore, the same proof can be presented to multiple clients as long as their known validators sets are roughly the same. This suggests a design where the light client servers will maintain in persistent storage a set of proofs generated at particular times (let's say at T1, T2, T3, T4 ..) and if a light client is at TC (T1 < TC < T2), the server will just have to serve the precomputed proofs at T2, T3, T4 and so on.

Once the client has synced the validator set, it can use the methods described in observation (1) to arrive at a new finalized state. To get the "instant sync" scheme working, we can decide that the proposals and attestations will include signatures over the root hash of the entire finalized state. This will slightly increase the size of the sync proof above because the validator set will be authenticated with a merkle proof now.

How to generate a small sync proof?

Generating a small sync proof is a relatively tough computational problem. Once a proof is generated, it's easy to verify that it has the desired properties, so operators of light client servers may decide to somehow share the burden of generating it or to outsource the work altogether. Here are some important considerations:

1) If you know as many individual signatures as possible, you have more freedom to aggregate them in more minimal configurations. The proof generator will be trying to monitor the whole network and it will try to keep a history of all observed individual signatures.

2) The generated sync proof at T2 must be such that every single finalized state between T1 and T2 has a validator set for which the signature at T2 represents a super-majority. This suggests a design where the proof generator keeps track of the best possible T_N, T_N+1, T_N+2, ... over time and these intervals may shift upon certain circumstances. Some heuristic will be applied to determine when the solution being found is "good enough".

I imagine that such proof-generating servers will be ran by few operators and the light-client server operators will be just fetching proofs from them.

Complexity analysis

I need to put more time to produce concrete numbers here and obviously the practical cost of the approach will depend on the actual rate of change in the validator set for which we need real-world statistics, but the hand-wavy version of the analysis is that the sync proofs will be smaller objects than the 9-day proofs suggested above, the skip interval should be much higher than 9 days and most of the time it would be possible to use the "instant sync" procedure (e.g. if the light client connects to the network once every couple of weeks).

ethereum / consensus-specs

Light client proposal #459

Protocol changes

Light client procedure

Explanation

Rationale

Complexity analysis

1. If you know the active validator set, you don't need to sync past blocks at all.

2. If a super-majority of the known validators have signed any datum in the consensus objects, that's enough for the light client to trust it.

Proposed solution

Suggested changes:

How to generate a small sync proof?

Complexity analysis