Open bdarnell opened 6 years ago
Proposal forwarding has become a general problem recently. We see this come up both in https://github.com/cockroachdb/cockroach/issues/37906 and https://github.com/cockroachdb/cockroach/issues/42821.
In the former, we want to make sure that a follower that's behind can't propose (and get) a lease. Allowing only the raft leader to add proposals to its log would be a way to get that (mod "command runs on old raft leader while there's already another one").
In the latter, we had to add tricky code below raft to re-add commands under a new lease proposed index if we could detect that they had not applied and were no longer applicable. This solution is basically technical debt and it has cost us numerous subtle bugs over time. We want to get rid of it, which means making sure that lease applied indexes don't generally reorder. Again, this can be achieved with good enough accuracy by only ever adding to the local log, i.e. never forwarding proposals.
It's not trivial to set this up, but ultimately I think we'll be better off.
The rules would be something like this:
Since only the leaseholder proposes to Raft, the leaseholder should be able to hold or regain membership just fine. We just have to make sure that a follower won't ever campaign against an active leaseholder.
(cc @nvanbenschoten and @ajwerner since we both talked about this)
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 5 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
Some related musings https://github.com/cockroachdb/cockroach/pull/102956#issuecomment-1541917274
coreos/etcd#9067 introduced a new error
ErrProposalDropped
. Properly handling this error will allow us to reduce the occurrence of ambiguous failures (Replica.executeWriteBatch
doesn't need to return anAmbiguousResultError
if it has not successfully proposed), and may allow us to be more intelligent in our raft-level retries.This error alone does not allow us to eliminate time-based reproposals because raft's MsgProp forwarding is fire-and-forget. However, if we disabled raft-level forwarding and did our own forwarding, I think we could make the retry logic more deterministic (or at least hide the timing elements in the RPC layer).
Note that in typical usage of raft, you'd respond to this error by passing it up the stack to a layer that can try again on a different replica. We can't do that because of our leases - until the lease expires, no other node could successfully handle the request, so we have to just wait and retry on the lease holder. (we might be able to use this to make lease requests themselves fail-fast, though).
Jira issue: CRDB-5872
Epic CRDB-39898