cometbft / cometbft

CometBFT: A distributed, Byzantine fault-tolerant, deterministic state machine replication engine. A fork and successor to Tendermint Core.
https://docs.cometbft.com
Apache License 2.0
636 stars 460 forks source link

consensus: proposer-based timestamps (PBTS) #1731

Closed melekes closed 6 months ago

melekes commented 10 months ago

Feature Request

Summary

Use proposer-based timestamps instead of calculating the median time of vote timestamps.

Detailed description of previous efforts ### Context This collapsed section describes the previous work on the adoption of proposer-based timestamps (PBTS) for the blocks produced in CometBFT networks and points out relevant aspects to be considered when porting this work to the current code base. Part of the references included in this section are linked in the PBTS project board in the old Tendermint Core repository: https://github.com/orgs/tendermint/projects/10/views/4. ### Rationale The adoption of proposer-based timestamps (PBTS) was originally proposed in 2018 (tendermint/tendermint#2840, tendermint/tendermint#1615) to replace the BFT Time algorithm (tendermint/tendermint#2013) that is currently used to compute the timestamp of new blocks. In short, the [BFT Time algorithm](https://github.com/cometbft/cometbft/blob/main/spec/consensus/bft-time.md) sets as timestamp of a block (stored in its `Time` field) the voting-power-weighted median of the `Precommit` vote messages (stored at the block's `LastCommit` field) that decided the previous block. The main problems with BFT Time are the following: * it can't be used for (future adoption of) tendermint/tendermint#1319 (since each vote has a unique timestamp component) * it gives +1/3 of the validator set complete control over the timestamp (see tendermint/tendermint#2653) The consensus algorithm adopted by CometBFT needs to be adapted for the use of proposer-based timestamps. Throughout 2021, two proposals of changes in the consensus algorithm were specified (tendermint/spec#261 and tendermint/spec#369). In addition, the architectural changes required for the adoption of PBTS were discussed in [ADR 071](https://github.com/cometbft/cometbft/blob/main/docs/architecture/tendermint-core/adr-071-proposer-based-timestamps.md), originally built from the first proposal (tendermint/tendermint#6799), then adapted for the second proposal (tendermint/tendermint#7764). ## Spec At the moment, the PBTS spec on [`main`](https://github.com/cometbft/cometbft/tree/main/spec/consensus/proposer-based-timestamp) refers to the _first proposal_ of changes in the consensus algorithm. The second and latest proposal of changes is currently available on [`master`](https://github.com/cometbft/cometbft/blob/master/spec/consensus/proposer-based-timestamp/) (https://github.com/tendermint/tendermint/pull/8096). There is some pending work, especially regarding the formal verification of the spec using TLA+ that needs to be retrieved and added to the spec (maybe ported to Quint?), as described in https://github.com/tendermint/tendermint/issues/8953. In particular, the `main/pbts` branch referred on that issue is not available anymore. A possible source for this content is PR https://github.com/tendermint/tendermint/pull/8600 and the involved authors. ## Implementation The second and latest version of the PBTS specification was implemented on `master`. This is the branch from which the release branches `v0.35.x` and `v0.36.x` were cut and later retracted. As a result, porting the PBTS implementation to `main`, which includes all changes introduced for releases `v0.37.x` and `v0.38.x` may not be a trivial task. The tracking issue for the PBTS implementation in `master` is tendermint/tendermint#6942. The implementation has some changes to the consensus protocol as prerequisites, not necessarily related to PBTS: - tendermint/tendermint#6850 - tendermint/tendermint#6849 - tendermint/tendermint#7546 Moreover, the validation of block timestamps on a node in recovery mode is a missing aspect of the implementation: - tendermint/tendermint#8954 ## Block Validation in Full Nodes A further implementation aspect, not discussed in this previous effort, is the backwards compatibility in full nodes. The PBTS implementation has removed the logic adopted by BFT Time algorithm for producing and validating block timestamps (tendermint/tendermint#7382, tendermint/tendermint#7563). We need to adapt blocksync to accept both mechanisms for producing block timestamps. ## Documentation and configuration The adoption of PBTS requires the configuration of synchronous parameters (clock precision and end-to-end delay for proposal messages) employed to determine whether the timestamp of a block is valid (time-validity). The adoption of too restrictive or tight synchronous parameters may hinder liveness, as blocks produced by honest validators could be systematically rejected, and impact performance. For this reason, the introduction of PBTS should be accompanied by a comprehensive documentation of how to choose the synchronous parameters and how to diagnose and solve PBTS-associated issues: - tendermint/tendermint#7756 - tendermint/tendermint#7576 - tendermint/tendermint#8046 - tendermint/tendermint#8591 It is worth mentioning that the time-validity of a proposed block is only verified for a limited number of rounds, currently hard-coded to 10 (tendermint/tendermint#7739). This decision was taken in order to ensure consensus liveness under the adoption of inappropriate synchronous parameters. It is worth considering having this limit configurable. Another possibility to consider is to allow users or the application to disable at all the time-validity checks of a block. This possibility is mentioned in the original issue proposing PBTS (tendermint/tendermint#2840) but has not been implemented. ## Experiments The implementation was tested using the `e2e` tests and evaluated in small-scale testnets (tendermint/tendermint#7757). This effort has to be extended using the QA infrastructure to test the solution in more realistic testnets (Digital Ocean). The experimental efforts included the definition of metrics from which to derive safe and realistic values for the synchronous parameters (tendermint/tendermint#7202), adopted as their default values (tendermint/tendermint#7788). Those experiments should be re-evaluated under the light of the new use cases and features introduced from when they were performed.

Teams interested

  1. DyDx
  2. Osmosis
  3. Hub

Execution steps:

Important: all development related to PBTS will be done on feature branch feature/pbts. Which will be merged back to main (via a merge commit) when the team agrees that:

Then, we will need to backport the branch onto branch v1.x. But it should not be difficult, as main and v1.x have diverged very little.

Implementation

Upstream Dependencies

Spec

Verification

Documentation

Experiments

Final Steps

cason commented 9 months ago

I think the first step here is to migrate the new PBTS spec to main. Unfortunately some people, even some partners, interested on this feature are looking at the outdated version of the spec currently on main. This migration should be easy and I can work on it.

cason commented 9 months ago
* [ ]  Remove https://github.com/cometbft/cometbft/blob/main/spec/consensus/bft-time.md

  * adapt any links to that spec elsewhere

We are not fully deprecating BFT Time, so I think this description must be preserved. Maybe updated, maybe merged with PBTS as the two ways of producing timestamps.

cason commented 6 months ago

There are two action points, but referring to the analysis of the impact of Byzantine nodes in the produced timestamps, that were removed from this tracking issue and will be addressed, on demand, as backlog items:

cason commented 6 months ago

The implementation of PBTS is completed.

It has landed on main branch (https://github.com/cometbft/cometbft/pull/2541) and it is part of the v1.x release branch (https://github.com/cometbft/cometbft/pull/2545).

Great work, folks :)