cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.89k stars 3.78k forks source link

scheduled jobs: avoid "top-of-the-hour" service level degradation #54537

Open mwang1026 opened 4 years ago

mwang1026 commented 4 years ago

When creating a schedule it's common to use simple cron syntax of '@daily' or '@hourly'. The downside of that is if every schedule is created with that syntax, cron runs at at the top of the hour / day. And if every service across all of your systems are using similar scheduling syntax then you get times where your services are hosed.

What we're looking for here is something where you can create schedules to avoid the above issues. "Random" or some sort of random jitter is probably a good enough heuristic to spread schedules out to avoid the problem. The ideal is to have a solution that can identify the best times to run to avoid "top of the hour" degradation.

For certain scheduled operations, you also want them to be on a consistent cadence but on staggered start times (e.g. every hour on the 23rd minute) -- namely Backups so that you can target an RPO.

Epic CRDB-7909

Jira issue: CRDB-3742

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/disaster-recovery

dt commented 1 year ago

I don't think we expect to implement this in the jobs system any time soon; users who are used to @houly in their cron system of choice may well expect it to be a shorthand for exactly 0 * * * *, as it is in most cron implementations, and those who want a random minute of the hour can already do that, just be using something like (random()*60)::int::string || ' * * * *' when they create their schedule instead of @hourly.

I think we could probably close this as unplanned.

rafiss commented 1 year ago

Also noting that the Schema Telemetry Job does some of its own randomization before choosing @weekly or @hourly: https://github.com/cockroachdb/cockroach/blob/9c510f9abdcd0d52e04f620ce5fa283c54d6ef46/pkg/sql/catalog/schematelemetry/schematelemetrycontroller/controller.go#L188-L214

Other internal jobs could do something like this too.