cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.21k stars 3.82k forks source link

kv: never ignore Raft leadership when proposing RequestLease requests #89212

Open nvanbenschoten opened 2 years ago

nvanbenschoten commented 2 years ago

In https://github.com/cockroachdb/cockroach/commit/8aa1c140eef574869dc70076987a3f12e19b7c3d, we added protection against replicas that were behind on their log attempting to acquire the lease and stalling. That commit did this by requiring lease requestors to be the Raft leader, which ensured that they were up-to-date on their log.

In https://github.com/cockroachdb/cockroach/commit/a767cdda788abbb7dd6a2e3b52df0d867f0b9bf8, we weakened this protection to work around cases where the current Raft leader could not acquire the lease. This resolved a few deadlocks, which are described in that commit message.

The deadlocks are real issues, but the resolution to ignore the Raft leadership status in these cases is problematic. At worst, it undermines the protection granted by https://github.com/cockroachdb/cockroach/commit/8aa1c140eef574869dc70076987a3f12e19b7c3d and permits risky lease requests.

A safer alternative would be for the replica that determines that the Raft leader is unsuitable to hold the lease to call a Raft (pre-vote) election and try to take leadership. If it succeeds, it can acquire the lease. If it fails, it wasn't a good candidate to hold the lease. This is the same strategy we employed in https://github.com/cockroachdb/cockroach/pull/87244.

Jira issue: CRDB-20174

blathers-crl[bot] commented 2 years ago

cc @cockroachdb/replication

ajd12342 commented 1 year ago

Hi @nvanbenschoten ! I am Anuj Diwan, a Computer Science PhD student at UT Austin. I am part of a team along with @arjunrs1 (Arjun Somayazulu) and we're taking a graduate Distributed Systems course. For our course project, we are interested in contributing to CockroachDB. This issue is related to our course material. Could we work on this issue? Any pointers for us to get started would be appreciated as well.

Thanks and regards, Anuj.