Closed vbuterin closed 5 years ago
closed via #476
Reviving this until we port this proposal and explanation somewhere else
The above proposal produces a highly predicatable amount of traffic, which is a nice property, but one advantage of the previous approach (based on older ideas of Vitalik presented here) was that the required traffic adapts to the rate of change in the active validator set. If the rate of change is low, the skips can be much longer than 9 days.
In the last few days, I've been looking for new ways to improve the previous scheme and here I'll outline a new algorithm that appears to be very efficient.
Before we start, let's make some observations:
A light client is interested only in latest finalized state. Everything else in the state of the beacon chain or the shards can be authenticated through merkle proofs obtained from a LES-like protocol. Thus, if a light client knows the active validator set, it can just listen for attestations and block proposals coming from known validators. With every "vote" seen on the network, the probabilistic assurance of the light client that a certain state is finalized will increase. When the network finalizes a new state, the light client will do the same. Furthermore, the client may be immediately presented with a set of signatures from a super-majority of the known validators that directly authenticates the latest finalized state (more on this later). This would be an instant sync procedure.
On the other hand, if the the light client doesn't know the active validator set, it cannot make any judgments regarding the observed traffic. Thus, the problem of syncing with the network after being offline for a while boils down to syncing the changes in the active validator set.
This comes from the economic reasoning used in Casper. Whatever the protocol changes are, we can just adapt the slashing conditions to give assurance to the light client that if a super-majority of the validators have signed something, this is either a finalized consensus or a lot of purchasing power is being destroyed.
I'll use this argument to make the claim that if the light client is presented with a plausible history about how the validator set changed, it's enough to authenticate the end result that is signed by a super-majority of the validators and the history itself may be transmitted or compressed in arbitrary ways. Thus, the syncing of the validator set changes is greatly simplified if the consensus objects somehow include signatures of all validators authenticating the validator set's contents at a particular time.
A key technical insight for this new proposal is that the BLS signature aggregation allows us not only to have multiple signers, but also to have multiple signed messages without increasing the signature size (the cost is only increased signing time; the verification time stays the same).
This proposal is a high-level description of how the light client sync would work. If there is interest, I can work on the concrete details or I can leave this to the EF research team. In Nimbus, we hope to implement proof-of-concept implementations of all viable light client proposals in the near future in order to compare the practical results of using them.
Using the key insight above, change the signatures in all block proposals and all attestations to also include a signature over the root merkle hash of the validator set in the latest finalized state.
The goal here is to accumulate signatures authenticating the contents of the validator set. If the light client knowns a certain finalized validator set VS_prev
, it can be presented with a sync proof that the set changed to VS_next
. The proof includes the following information:
A minimal list of exits (list of indices).
A minimal list of new deposits (list of <pub key, index> pairs)
A BLS signature that includes at least a super-majority of known validators at VS_prev
, signing VS_next
.
A bitmask (or a list of signature counts per validator if aggregating just one signature per validator proves to be difficult). Please note that this list will also refer to new validators specified in (2).
A minimal list of transient validators whose signatures got aggregated in (3). A transient validator is one who made a deposit and an exit between VS_prev
and VS_next
. This list is optional and it will be required only if it proves hard to produce an aggregated signature from the minimal set of required validators (more on this below).
A list of other data that ended up in the BLS signature (this should encode a tree effectively. @AlexeyAkhunov's encoding scheme comes to mind). The client doesn't need to authenticate this data, but it's needed in order to verify the aggregated signature. There is a trade-off here - we can optimize the traffic needed for light clients by storing the validator set signatures separately in the attestations or block proposals, which will increase the network traffic a bit for everyone else.
Please note that the distance between VS_prev
and VS_next
can be quite significant. Such a proof needs to be generated roughly when 1/3 of the validators from VS_prev
are no longer active. Furthermore, the same proof can be presented to multiple clients as long as their known validators sets are roughly the same. This suggests a design where the light client servers will maintain in persistent storage a set of proofs generated at particular times (let's say at T1
, T2
, T3
, T4
..) and if a light client is at TC
(T1
< TC
< T2
), the server will just have to serve the precomputed proofs at T2
, T3
, T4
and so on.
Once the client has synced the validator set, it can use the methods described in observation (1) to arrive at a new finalized state. To get the "instant sync" scheme working, we can decide that the proposals and attestations will include signatures over the root hash of the entire finalized state. This will slightly increase the size of the sync proof above because the validator set will be authenticated with a merkle proof now.
Generating a small sync proof is a relatively tough computational problem. Once a proof is generated, it's easy to verify that it has the desired properties, so operators of light client servers may decide to somehow share the burden of generating it or to outsource the work altogether. Here are some important considerations:
1) If you know as many individual signatures as possible, you have more freedom to aggregate them in more minimal configurations. The proof generator will be trying to monitor the whole network and it will try to keep a history of all observed individual signatures.
2) The generated sync proof at T2
must be such that every single finalized state between T1
and T2
has a validator set for which the signature at T2
represents a super-majority. This suggests a design where the proof generator keeps track of the best possible T_N
, T_N+1
, T_N+2
, ...
over time and these intervals may shift upon certain circumstances. Some heuristic will be applied to determine when the solution being found is "good enough".
I imagine that such proof-generating servers will be ran by few operators and the light-client server operators will be just fetching proofs from them.
I need to put more time to produce concrete numbers here and obviously the practical cost of the approach will depend on the actual rate of change in the validator set for which we need real-world statistics, but the hand-wavy version of the analysis is that the sync proofs will be smaller objects than the 9-day proofs suggested above, the skip interval should be much higher than 9 days and most of the time it would be possible to use the "instant sync" procedure (e.g. if the light client connects to the network once every couple of weeks).
The core of this protocol was spec'd out in the repo. Do we want to leave this open for rationale/analysis?
See for background: https://github.com/ethereum/eth2.0-specs/issues/403
Protocol changes
(n - n % EPOCH_LENGTH - 1) + EXIT_ENTRY_DELAY
(so always pegged to an epoch boundary)BeaconState
,latest_active_index_roots: List[uint64]
, a list ofmerkle_tree(get_active_validator_indices(state, block.slot))
of the last 8192 epoch boundaries, maintained similarly to the randao roots (if the indices are not a power of two, we zero-pad the list up to a power of two to make it Merkelizable)BeaconState
,latest_active_validator_set_lengths: List[uint64]
, a list oflen(get_active_validator_set_lengths(state, block.slot))
of the last 8192 epochs, maintained similarly to the randao rootsseed
used inget_shuffling
, useseed = sha3(get_randao_mix(slot), get_active_index_root(slot))
whereget_active_index_root
is defined similarly toget_randao_mix
validator_set_delta_hash_chain
Version 1b: instead of
merkle_tree
, usetree_hash_root
. The tree hash root has hashed-in metadata about the length of the list, solatest_active_validator_set_lengths
becomes superfluous and can be removed.Light client procedure
Suppose that a light client already authenticated the chain up to some
KNOWN_BLOCK
, that it considers final. As per the requirements of persistent committees, this means that the light client has the information that it needs to predict persistent committees up to ~9 days in the future (see https://github.com/ethereum/eth2.0-specs/issues/375). The light client will ask for aLightClientSkipProof
object for a future block up to ~9 days in the future, which contains the following objects:header
of the proposed blockvalidator_index_proofs
(Merkle branches + leaves)validator_record_proofs
(Merkle branches + leaves)validator_balance_proofs
(Merkle branches + leaves)participation_bitfield
16 bytes longaggregate_signature
The light client privately chooses a
shard
in[0...SHARD_COUNT-1]
. The light client will use the randao mixes, active index roots and active validator set lengths that it can derive from the state root ofKNOWN_BLOCK
to derive enough information to compute:prev_committee_compute_slot = header.slot - header.slot % RESHUFFLE_PERIOD - RESHUFFLE_PERIOD
andpost_shuffling_compute_slot = prev_committee_compute_slot + RESHUFFLE_PERIOD
prev_committee = split(get_shuffling(seed, validators, prev_committee_compute_slot), SHARD_COUNT)[shard]
post_committee = split(get_shuffling(seed, validators, post_committee_compute_slot), SHARD_COUNT)[shard]
committee = [index for index in prev_committee if hash(get_randao_mix_at_epoch(post_committee_compute_slot) + bytes8(index)) % RESHUFFLE_PERIOD > header.slot % RESHUFFLE_PERIOD] + [index for index in post_committee if hash(get_randao_mix_at_epoch(post_committee_compute_slot) + bytes8(index)) % RESHUFFLE_PERIOD < header.slot % RESHUFFLE_PERIOD]
(this is the actual committee for the shard)focus_committee = committee[:128]
Number-theoretic shuffles would let us compute all of this quickly. We
For each
i in range(128)
, it will check:validator_index_proofs
are valid Merkle branches for eachindex
infocus_committee
, and provecommittee_indices = [active_validator_indices[index] for index in focus_committee]
validator_record_proofs
are valid Merkle branches for each index incommittee_indices
and prove the validator records for each validatorvalidator_balance_proofs
are valid Merkle branches for each index incommittee_indices
and prove the validator balances for each validatorIt will also check the validity of the proofs of possession. Finally, it will compute a
group_pubkey
from the subset of the validators specified in theparticipation_bitfield
plus each pubkey in the proofs of possession, andBLSVerify(pubkey=group_pubkey, message=hash_tree_root(header), signature=aggregate_signature, domain=PERSISTENT_COMMITTEE_DOMAIN)
. It also checks that the 1 bits in theparticipation_bitfield
correspond to at least 2/3 of the total balance in the 128 validators selected.If all checks pass,
header
becomes the newKNOWN_BLOCK
.Explanation
The idea is simple: use the client's existing knowledge of
KNOWN_BLOCK
to compute the committee that will be valid in some block up to 9 days in the future, then ask the network to provide Merkle branches proving the indices, and then the pubkeys and the balances of that committee. This gives it the information it needs to compute and verify an aggregate signature.Light clients can skip ahead as many blocks at a time as they please, either one block at a time or the maximum of 9 days at at time. Note that if a light client is skipping less than 9 days at a time, it may not need new Merkle branches as the committee is the same as before, and so the branches can be omitted, and only a header, bitfield and aggregate signature can be submitted.
Rationale
Complexity analysis
So 250 to 900 kB in total per ~9 days. Compare Bitcoin block headers: 80 bytes per 10 minutes ~= 100 kB per 9 days, or eth1 block headers: 500 bytes per 15 seconds ~= 26 MB per 9 days.
Note also that proofs can be compressed via a SNARK or STARK.