ethereum / consensus-specs

Ethereum Proof-of-Stake Consensus Specifications
Creative Commons Zero v1.0 Universal
3.59k stars 985 forks source link

Issues with fork choice and non-genesis anchors #2566

Open michaelsproul opened 3 years ago

michaelsproul commented 3 years ago

The fork choice specification allows an arbitrary block to be provided as the root of the block tree, which it refers to as an anchor. In practice, this is almost always the genesis block or some block that is finalized and has all of the blocks that finalize it as descendants in the tree. Issues arise when providing a single finalized block to get_forkchoice_store, as one may want to do when performing a checkpoint sync.

def get_forkchoice_store(anchor_state: BeaconState, anchor_block: BeaconBlock) -> Store:
    assert anchor_block.state_root == hash_tree_root(anchor_state)
    anchor_root = hash_tree_root(anchor_block)
    anchor_epoch = get_current_epoch(anchor_state)
    justified_checkpoint = Checkpoint(epoch=anchor_epoch, root=anchor_root)
    finalized_checkpoint = Checkpoint(epoch=anchor_epoch, root=anchor_root)
    return Store(
        time=uint64(anchor_state.genesis_time + SECONDS_PER_SLOT * anchor_state.slot),
        genesis_time=anchor_state.genesis_time,
        justified_checkpoint=justified_checkpoint,
        finalized_checkpoint=finalized_checkpoint,
        best_justified_checkpoint=justified_checkpoint,
        blocks={anchor_root: copy(anchor_block)},
        block_states={anchor_root: copy(anchor_state)},
        checkpoint_states={justified_checkpoint: copy(anchor_state)},
    )

The fork choice store is initialized with synthetic justified and finalized checkpoints constructed from the anchor block's root and current epoch, notably not the justified and finalized checkpoints of the anchor state, i.e. anchor_state.current_justified_checkpoint, anchor_state.finalized_checkpoint.

The problem with using these synthetic checkpoints is that they are ahead of the checkpoints of the anchor block's descendants, and they remain ahead until some descendant block finalizes the anchor block. For example, if we initialize the store with the block at slot 3200 from the start of epoch 100, we will set both checkpoints to Checkpoint(epoch=100, root=...). When a block at slot 3201 is applied, it will (likely) have a justified checkpoint with epoch=99 and a finalized checkpoint with epoch=98. The checks in filter_block_tree then prevent this block from becoming the head, even though it a sense it should. It takes several epochs worth of blocks before a block arrives that is able to update the store's idea of the justified and finalized checkpoints, at which point the head will jump from slot 3200 to the finalization-updating descendant.

In Lighthouse I tried changing get_forkchoice_store to use the state's checkpoints, but this quickly uncovered a myriad of violated assumptions. Things like not being able to find the justified checkpoint in the store to start get_head, or being unable to check whether a newly added block is a descendant of the finalized block (because it is missing).

I suspect the best solution (particularly temporarily) is to work around this issue and live with fork choice being a bit slow to update the head when initialized from an anchor. Usually during checkpoint sync new blocks will be applied quickly and the head will update.

ajsutton commented 2 years ago

For the record, in Teku we resolve this by considering all blocks viable for head so long as the current justified and finalized epochs are the initial epoch from the anchor. That is in filter_block_tree where the spec has:

correct_justified = (
        store.justified_checkpoint.epoch == GENESIS_EPOCH
        or head_state.current_justified_checkpoint == store.justified_checkpoint
    )
    correct_finalized = (
        store.finalized_checkpoint.epoch == GENESIS_EPOCH
        or head_state.finalized_checkpoint == store.finalized_checkpoint
    )

we effectively have:

correct_justified = (
        store.justified_checkpoint.epoch == INITIAL_EPOCH
        or head_state.current_justified_checkpoint == store.justified_checkpoint
    )
    correct_finalized = (
        store.finalized_checkpoint.epoch == INITIAL_EPOCH
        or head_state.finalized_checkpoint == store.finalized_checkpoint
    )

where INITIAL_EPOCH is the epoch the anchor is from. Essentially the anchor state is the genesis, just not at slot 0 so we need to treat it as such here.