celestiaorg / celestia-core

A fork of CometBFT
Apache License 2.0
481 stars 264 forks source link

Investigate consensus steps timeouts #87

Closed liamsi closed 10 months ago

liamsi commented 3 years ago

We will have to experiment with block times and tendermint consensus related timeout to find a reasonable middle ground.

Update: Increasing the timeouts is no longer relevant but what we instead really we want is: https://github.com/tendermint/tendermint/issues/5911 We should help the tendermint team to get that feature shipped.

The current default timeouts can be found here: https://github.com/lazyledger/lazyledger-core/blob/b68e2dc394c0fc1840d57b1a763c77c1ef7a13cd/config/config.go#L869-L874

For immediate execution, we might need to increase TimeoutPropose as a proposer would need to execute Tx against the state before proposing. Also, we'd like the block data to propagate through the p2p network (reaching at least some storage nodes (ref: https://github.com/lazyledger/lazyledger-specs/issues/3).

Also we likely would need to increase the TimeoutPrevote to allow validators to validate blocks via DA proofs (see #65). (this is no longer planned)

We could start with the current defaults and experiment our way up to defaults where consensus nodes don't end up timing out and starting new rounds without making blocks.

ref: https://github.com/tendermint/tendermint/issues/5911

liamsi commented 2 years ago

@evan-forbes can we hardcode/update the default config such that every node locally will have the same timeouts to establish a 30s block time? This is only a bandaid until we can either upgrade to a tendermint version that makes this a consensus param, or, we backport this before launch (should open a separate issue for the latter).

evan-forbes commented 2 years ago

sure thing! since starport hasn't upgraded to v0.46 yet, and we want to move away from their stuff anyway, I've been replacing the code we rely on from them with local infra, and a huge bonus to that is that we can easily change the default config.

evan-forbes commented 2 years ago

While we seem to have a working version for this by increasing the timeout-commit, I still think we should leave this open and do more experimentation on optimal values. I think doing this after incentivized testnet makes the most sense, since we have a working version as is.

evan-forbes commented 1 year ago

We had to increase the timeout propose significantly, to 10s, to come to consensus on the first round. This has other consequences, such as inconsistent block times that we will need a dynamic timeout commit to fix. We should be able to decrease the timeout propose after we increase the efficiency of tendermint's p2p stack.

I'll continue to leave this issue open, at least until we have stable defaults picked.

evan-forbes commented 10 months ago

I think this issue can be closed as completed since the timeouts on mainnet seems stable. However, that is definitely not to say that the timeouts should not be changed in the future.