Open mwang1026 opened 4 years ago
cc @cockroachdb/disaster-recovery
I don't think we expect to implement this in the jobs system any time soon; users who are used to @houly
in their cron system of choice may well expect it to be a shorthand for exactly 0 * * * *
, as it is in most cron implementations, and those who want a random minute of the hour can already do that, just be using something like (random()*60)::int::string || ' * * * *'
when they create their schedule instead of @hourly
.
I think we could probably close this as unplanned.
Also noting that the Schema Telemetry Job does some of its own randomization before choosing @weekly
or @hourly
: https://github.com/cockroachdb/cockroach/blob/9c510f9abdcd0d52e04f620ce5fa283c54d6ef46/pkg/sql/catalog/schematelemetry/schematelemetrycontroller/controller.go#L188-L214
Other internal jobs could do something like this too.
When creating a schedule it's common to use simple cron syntax of '@daily' or '@hourly'. The downside of that is if every schedule is created with that syntax, cron runs at at the top of the hour / day. And if every service across all of your systems are using similar scheduling syntax then you get times where your services are hosed.
What we're looking for here is something where you can create schedules to avoid the above issues. "Random" or some sort of random jitter is probably a good enough heuristic to spread schedules out to avoid the problem. The ideal is to have a solution that can identify the best times to run to avoid "top of the hour" degradation.
For certain scheduled operations, you also want them to be on a consistent cadence but on staggered start times (e.g. every hour on the 23rd minute) -- namely Backups so that you can target an RPO.
Deliverables *
A way to specify randomness of the start time within the recurring window
That randomness is applied on the initial schedule and the specified cadence determines future schedule times
Other alternatives *
Load based determination of when a schedule should run
Randomness that's always applied on every run
Epic CRDB-7909
Jira issue: CRDB-3742