AntelopeIO / leap

C++ implementation of the Antelope protocol
Other
116 stars 68 forks source link

IF: Ensure pause and resume work with Instant Finality #1529

Closed ericpassmore closed 7 months ago

ericpassmore commented 1 year ago

Plugin producer pause and resume only effect block production. With hotstuff pause and resume should handle finalization and production.

arhag commented 1 year ago

We decided that we do not want the existing pause/resume endpoints of the producer plugin to impact HotStuff consensus leader or block finalizer operations. It should only pause/resume creation of new blocks.

We discussed whether we should have a new endpoint to pause/resume block finalization. We are leaning towards not doing so because we want to encourage BPs to use an inherently safe process of switching our their block finalization key on-chain. There may have been reluctance do this in the past because of the 6 minute delay (on EOS) for the new proposed producer schedule to take effect. However, with IF, we could have the finalizer set change very rapidly (within seconds).

We do need to document the correct process of handling BP block finalization failover (and how that can be kept separate process of handling BP block proposal failover which could still use the pause/resume endpoints) to educate the BPs. That is captured in another issue: https://github.com/AntelopeIO/spring/issues/93.

arhag commented 1 year ago

We discussed whether we should have a new endpoint to pause/resume block finalization. We are leaning towards not doing so because we want to encourage BPs to use an inherently safe process of switching our their block finalization key on-chain.

Feedback from @matthewdarwin here suggests that the BPs may still really want an endpoint to pause/resume finalization as well. A decision still needs to be made on whether we provide such an endpoint or not for 5.0, but even if we do, I think we still should keep the pause/resume of block production and pause/resume of block finalization separate APIs.

matthewdarwin commented 1 year ago

It's a significant change, so let's have a discussion on this at the next node operator roundtable. @bhazzard @heifner

The current discussion of the rollout plans for upgrade, did not discuss the need for node operator management changes. Many BPs probably have been running the same process for last 5 years.

arhag commented 7 months ago

Current behavior of the implementation in the branch is for the endpoint to control block production only and not impact finalizer voting. There is no mechanism currently to pause/resume finalizer voting on a nodeos instance.

We currently do not see any value in adding such an endpoint. Failover to other finalizer nodes should be handled by using different keys per node and using the core contract to change the active finalizer key of the producer. This mechanism could in theory also be used to disable voting on all the finalizer nodes of a BP by setting as active a finalizer key that is not used by any of the live instances. But it is not recommended for a BP to do this since it is there responsibility to participate in finalizer voting as an active BP. If the BP wishes to temporarily disable their BP, they can do so through the core contract in which case they would be pulled out of the active BP set both as a finalizer and a block proposer.

If there was some reason to add an endpoint to pause/resume finalizer voting discovered, we would capture that enhancement through a separate issue.

But this issue has been resolved and can be closed.