etcd-io / raft

Raft library for maintaining a replicated state machine
Apache License 2.0
633 stars 160 forks source link

stepWait not used in ProposeConfChange #43

Open lavacat opened 1 year ago

lavacat commented 1 year ago

stepWait was implemented to fail-fast in Propose method. But it was never used in ProposeConfChange method.

Question was raised about it on original PR. Also, it's was causing timeout in ExampleCluster_memberAddAsLearner test. comment

This isn't a critical issue, because in production ProposeConfChange isn't usually called consecutively. But it seams to me that stepWait was missed for ProposeConfChange method.

folays commented 5 months ago

OP wonders about ProposeConfchange().

Besides ProposeConfChange(), Propose() has also some problems which probably stems from the same origins.

I will talk only about cluster where DisableProposalForwarding = false, because = true has it own set of unfollow-ability of failed proposals. It helps drafting things quickly, but it's bad in ensuring reliability.

(*node).Propose() method is widly undocumented on what can it be expected of it. An API is like a contract, and without specifications on it, what are its limitation etc..., it's like there is no guarantee at all to have it to eventually do something in any specific timeframe, so it's unuable in any serious production environment without first reading all code path it branches into to figure out those.

.Propose() only specification is Note that proposals can be lost without notice, therefore it is user's job to ensure proposal retries. which does not help at all. No words on when/how the loss can be noticed and/or assumed.

It appears that the set of predicates could hold for .Propose() :

.ProposeConfChange() is worse, and its API behavior for callers seems to be "hope for the best", "retry after some unspecified time if commit does not appears in .CommittedEntries[]"

The need is somewhat simple : people wants to have Propose() and ProposeConfChange() succeed as soon as possible, which entrails to have them fail "fast" is they are failing, which could be better rephrased as : having a mechanism (or documentation) on how to known for a given Proposal, As-Soon-As-Possible, that it failed/dropped, so that the caller can (at its discrete choice) either quickly return the error, or quickly retry.

I understand that the protocol and logic behing it implies that there is no timeframe guarantee on the commitability or failure of Proposals.

It's OKAY for something to be in progress, and having to wait for an indeterminate amount of time for it to progress, as long as further progress is expected.

What's not okay if for something to definitely having stopped progression, where failure is certain (absence of further progress is certain), but no mechanism is being provided to detect/notify the certainty of the failure.

What indeed feels unatural is, when a "proposal requester" (e.g. a GRPC caller whishing to have some data commited to the Raft logs) DIRECTLY requested the leader, and when this leader reached a point of knowledge of the wherabouts of the ongoing Proposal, specifically reached knowledge of its failure, there is zero documentation on how to detect that.

Note : I'm saying "a / the leader", because if the leader is splitbrained from the majority and doesn't know it yet, it's okay for it to hold "proposal requester" in holding status, because it would still consider itself as "a leader" whilst being only really an "incapacitated leader", whereas an "operative leader with majority" has been elected on the other splitbrain side, but of course the incapacitated wouldn't know it before any timeout occurs.

I think that OP's ProposeConfChange is even worse that Propose(), because ProposeConfChange() deep calls .stepWaitWithOption() with wait = false, which means that if the leader ITSELF decides to drop the proposal, the caller of ProposeConfChange() would have no way of knowing it.