storage: Handle raft.ErrProposalDropped

bdarnell commented 6 years ago

coreos/etcd#9067 introduced a new error ErrProposalDropped. Properly handling this error will allow us to reduce the occurrence of ambiguous failures (Replica.executeWriteBatch doesn't need to return an AmbiguousResultError if it has not successfully proposed), and may allow us to be more intelligent in our raft-level retries.

This error alone does not allow us to eliminate time-based reproposals because raft's MsgProp forwarding is fire-and-forget. However, if we disabled raft-level forwarding and did our own forwarding, I think we could make the retry logic more deterministic (or at least hide the timing elements in the RPC layer).

Note that in typical usage of raft, you'd respond to this error by passing it up the stack to a layer that can try again on a different replica. We can't do that because of our leases - until the lease expires, no other node could successfully handle the request, so we have to just wait and retry on the lease holder. (we might be able to use this to make lease requests themselves fail-fast, though).

Jira issue: CRDB-5872

Epic CRDB-39898

tbg commented 4 years ago

Proposal forwarding has become a general problem recently. We see this come up both in https://github.com/cockroachdb/cockroach/issues/37906 and https://github.com/cockroachdb/cockroach/issues/42821.

In the former, we want to make sure that a follower that's behind can't propose (and get) a lease. Allowing only the raft leader to add proposals to its log would be a way to get that (mod "command runs on old raft leader while there's already another one").

In the latter, we had to add tricky code below raft to re-add commands under a new lease proposed index if we could detect that they had not applied and were no longer applicable. This solution is basically technical debt and it has cost us numerous subtle bugs over time. We want to get rid of it, which means making sure that lease applied indexes don't generally reorder. Again, this can be achieved with good enough accuracy by only ever adding to the local log, i.e. never forwarding proposals.

It's not trivial to set this up, but ultimately I think we'll be better off.

The rules would be something like this:

don't propose to raft unless local raft instance believes it is leader (i.e. block on a state change, perhaps wake up/campaign eagerly when appropriate). Due to our existing heuristics for campaigning actively when it's appropriate the situation in which there's no leader should be transient (and rare). Note that we want to only ever propose with a reasonable max lease index, but this will be given because we wouldn't be proposing before seeing our own lease, at which point we're the only one proposing and know the current LAI with perfect accuracy.
disable proposal forwarding. On ErrProposalDropped, go back up the stack into 1) because the leader has likely changed. If there's a raft leader and no lease is held, direct the command to that replica instead. (This last item also makes sure that when there's no lease, the raft leader will be the node that gets to request a lease).
lease transfers might need some smoothing out. The transferer initially holds both lease and raft leadership. Once it has proposed the lease transfer, it immediately transfers Raft leadership over. (I don't know if the process is reactive enough today; there's chance it leaves the raft leadership transfer to the tick-based mechanism today).

Since only the leaseholder proposes to Raft, the leaseholder should be able to hold or regain membership just fine. We just have to make sure that a follower won't ever campaign against an active leaseholder.

tbg commented 4 years ago

(cc @nvanbenschoten and @ajwerner since we both talked about this)

github-actions[bot] commented 3 years ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 5 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

tbg commented 1 year ago

cockroachdb / cockroach

storage: Handle raft.ErrProposalDropped #21849