ethereum / consensus-specs

Ethereum Proof-of-Stake Consensus Specifications
Creative Commons Zero v1.0 Universal
3.59k stars 984 forks source link

Concrete proposals for what data gets committed to in a crosslink #529

Closed vbuterin closed 5 years ago

vbuterin commented 5 years ago

Currently, the AttestationData structure asks crosslink committee members to sign a shard_block_hash, the hash of a shard block at some recent slot (perhaps the epoch start slot of the epoch during which the crosslink is made, though this has not yet been specified). There is also a custody bit, which is the first bit of the hash of something, which has also not yet been specified. This post will attempt to come up with concrete proposals to tackle these issues.

First, some facts to keep in mind:

The simplest approach would be to have shard_block_hash point to the shard block at the end of the epoch before the epoch in which the crosslink was included, and make a proof of custody of just the shard block data. Note that because all shard block data is perfectly 2^k sized (16 KB = 2^14 bytes, or 2^9 32-byte chunks), a Merkle tree of Merkle trees of shard data from N blocks is the same as a Merkle tree of the concatenated shard data, as long as we make sure to fill unused leaves in the tree with merkle_root([b'\x00' * 32] * 512) instead of the zero-hash.

However, this would require a client seeking to gain a guarantee on the chain's integrity to download not just the beacon chain but also the shard chains, which adds a significant amount of data: up to 1/16 the data of all shards in the entire chain! It would also make fraud proof enforcement harder, as the beacon chain would not have access to any state roots in between the crosslinks. A better approach would have shard_block_hash (and therefore the custody bit) include the block bodies and also the headers.

For the proposals, we rename shard_block_hash to custody_commitment_data.

A philosophical note

We can consider the crosslinks being included into the beacon chain as being the "real" shard blocks' headers. So all "real shard block headers" get included into the beacon chain. The shard blocks that appear in the intermediate stages are merely a coordination device to assist the proposal committee on coming together to agree what block to propose, and in such a way that transaction senders can get assurance within one slot that their block will (likely) get included. From this point of view, we want to be able to fully verify state transitions inside of these "real shard blocks" and fully verify that the coordination game was actually followed, so we should include shard headers in the data to be committed.

Proposal 1: two sub-trees

Let custody_commitment_data = hash(header_root, body_root), where body_root is a Merkle root of all block data, and header_root is a Merkle root of all header data, zero-padded to 16 KB (for skipped slots, block data is fully zero, and header data is some placeholder containing the most recent block header root and state_root). We can add a state transition validity fraud proof by asking for a Merkle branch for the header and a Merkle branch in the corresponding block root in the body data, and checking that the latter does not match the data_root in the former.

Proposal 2: interlacing

For each block, have 32 kilobytes of data, where the first 16 kilobytes are the header (zero-padded to 16 KB) and the second 16 kilobytes are the data (for skipped slots, block data is fully zero, and header data is some placeholder containing the most recent block header root and state_root). We can add a state transition validity fraud proof with a Merkle branch for the header and for the body data as in proposal 1, also checking that the latter does not match the data_root in the former. However, the fraud proof will be shorter, because most of the two Merkle branches are shared because the header and body are beside each other in the tree.

Proposal 3: optimizing interlacing

For each block, have 32 kilobytes of data, where the first 2 kilobytes are the header (zero-padded to 2 KB) and the remaining 30 kilobytes are the data (for skipped slots, block data is fully zero, and header data is some placeholder containing the most recent block header root and state_root). A block header now contains four data roots for the 2+4+8+16 kilobytes of the data respectively. We would add a fraud proof type for each of the four roots and parts of the block data.

Proposal 3 is more efficient than proposal 2 when we want to add data availability proofs because it does not add the ~80% overhead from hashing many zero bytes, but it also adds some extra complexity. We could mitigate the 80% overhead by simply using the space for other purposes. Possible ways to use the space include:

These ideas would be easier to implement if there was a large contiguous pool of data, which is an advantage of proposal 1 over proposal 2.

djrtwo commented 5 years ago

We went with proposal 1 https://github.com/ethereum/eth2.0-specs/blob/dev/specs/core/1_shard-data-chains.md#compute_crosslink_data_root