Closed dt closed 22 hours ago
Looks fine to me from the changefeed side
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
do you remember why we added backoff in the first place? like, are there jobs that we should teach to back off?
Looking at some of the git history, at least one motivation was limiting the impact of crashing-bugs in jobs: https://github.com/cockroachdb/cockroach/issues/44594
TFTR!
bors r+
Previously the jobs system would count how many times a job had been resumed as well as when it most recent was resumed and 'backoff' running resume if a job had been resumed many times. This behavior, however, has routinely caused problems for several jobs, in particular those that run forever and can thus have a large number of times they have been resumed when they move about a cluster as nodes restart, which is perfectly fine. If a given job wishes to hold off on executing for some reason, that really should be up to that job and the jobs system should be invoking that job's resumer so that it can make that decision on its own, rather than having a job that claims to be 'running' and has a node holding its adoption claim, but is not invoked on that node.
Release note: none. Epic: none.