Backport 1/1 commits from #134080 on behalf of @arulajmani.
/cc @cockroachdb/release
In e60feee, we started persisting leader information in the HardState and loading it upon restart. As a result, upon restart, every raft peer begin its life in a heartbeat lease. This meant that for the first 2-4 seconds upon restart, a leader could not be elected.
This regression was particularly bad for quiesced ranges, as this 2-4 second timer would only start once the replica is ticked. An observed effect of this was changefeedds over quiesced ranges, where we serially acquire leases on the constituent ranges to publish the rangefeeds closed timestamp. This 2-4 second timer would end up stacking, which could cause O(ranges) seconds resolved timestamp lag. See the accompanying test which constructs this scenario for more details.
This patch resolves this regression by only loading leader information if a leader was fortified at shutdown time. If it wasn't, we don't load leader information, and therefore don't begin life in a heartbeat lease. Note that because leader leases are not quiesced, the 2s regression is not as pernicious as it is for epoch based leases.
Please check the backport criteria before merging:
[ ] Backports should only be created for serious
issues or test-only changes.
[ ] Backports should not break backwards-compatibility.
[ ] Backports should change as little code as possible.
[ ] Backports should not change on-disk formats or node communication protocols.
[ ] Backports should not add new functionality (except as defined
here).
[ ] Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
[ ] All backports must be reviewed by the owning areas TL. For more information as to how that review should be conducted, please consult the backport
policy.
If your backport adds new functionality, please ensure that the
following additional criteria are satisfied:
- [ ] There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
- [ ] The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
- [ ] New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
- [ ] The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
- [ ] Your backport must be accompanied by a post to the appropriate Slack
channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.
Also, please add a brief release justification to the body of your PR to justify this
backport.
Backport 1/1 commits from #134080 on behalf of @arulajmani.
/cc @cockroachdb/release
In e60feee, we started persisting leader information in the HardState and loading it upon restart. As a result, upon restart, every raft peer begin its life in a heartbeat lease. This meant that for the first 2-4 seconds upon restart, a leader could not be elected.
This regression was particularly bad for quiesced ranges, as this 2-4 second timer would only start once the replica is ticked. An observed effect of this was changefeedds over quiesced ranges, where we serially acquire leases on the constituent ranges to publish the rangefeeds closed timestamp. This 2-4 second timer would end up stacking, which could cause O(ranges) seconds resolved timestamp lag. See the accompanying test which constructs this scenario for more details.
This patch resolves this regression by only loading leader information if a leader was fortified at shutdown time. If it wasn't, we don't load leader information, and therefore don't begin life in a heartbeat lease. Note that because leader leases are not quiesced, the 2s regression is not as pernicious as it is for epoch based leases.
Epic: none
Release note: None
Release justification: fixes GA-blocker.