cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.8k forks source link

release-24.3: kv: ensure leader election is not delayed for quiesced ranges on restart #134101

Closed blathers-crl[bot] closed 11 hours ago

blathers-crl[bot] commented 12 hours ago

Backport 1/1 commits from #134080 on behalf of @arulajmani.

/cc @cockroachdb/release


In e60feee, we started persisting leader information in the HardState and loading it upon restart. As a result, upon restart, every raft peer begin its life in a heartbeat lease. This meant that for the first 2-4 seconds upon restart, a leader could not be elected.

This regression was particularly bad for quiesced ranges, as this 2-4 second timer would only start once the replica is ticked. An observed effect of this was changefeedds over quiesced ranges, where we serially acquire leases on the constituent ranges to publish the rangefeeds closed timestamp. This 2-4 second timer would end up stacking, which could cause O(ranges) seconds resolved timestamp lag. See the accompanying test which constructs this scenario for more details.

This patch resolves this regression by only loading leader information if a leader was fortified at shutdown time. If it wasn't, we don't load leader information, and therefore don't begin life in a heartbeat lease. Note that because leader leases are not quiesced, the 2s regression is not as pernicious as it is for epoch based leases.

Epic: none

Release note: None


Release justification: fixes GA-blocker.

blathers-crl[bot] commented 12 hours ago

Thanks for opening a backport.

Please check the backport criteria before merging:

If your backport adds new functionality, please ensure that the following additional criteria are satisfied: - [ ] There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way. - [ ] The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting). - [ ] New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added). - [ ] The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules. - [ ] Your backport must be accompanied by a post to the appropriate Slack channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this backport.

cockroach-teamcity commented 12 hours ago

This change is Reviewable