distribworks / dkron

Dkron - Distributed, fault tolerant job scheduling system https://dkron.io
GNU Lesser General Public License v3.0
4.33k stars 386 forks source link

Schedule jitter needed... #866

Open alexanderfefelov opened 3 years ago

alexanderfefelov commented 3 years ago

...like this

The RANDOM_DELAY variable allows delaying job startups by random amount of minutes with upper limit specified by the variable.

or this

-j jitter Enable time jitter. Prior to executing commands, cron will sleep a random number of seconds in the range from 0 to jitter.

because

This option can help to smooth down system load spikes during moments when a lot of jobs are likely to start at once, e.g., at the beginning of the first minute of each hour.

vcastellm commented 3 years ago

Thanks for the suggestion, I like it

yvanoers commented 3 years ago

Upon seeing my thoughts were: If it is known that a lot of jobs start at once and it is OK for them to not run exactly at the scheduled time, then why not schedule them at different times in the first place?

I'm basically looking for the justification that this should be the responsibility of a scheduler.

vcastellm commented 3 years ago

Good question and this is what I'm currently doing, and I like it because it will take less mental effort than to drift a lot of jobs by hand and will specially be useful when working with multiple people/teams creating jobs

yvanoers commented 3 years ago

I see. I'm not a fan of using randomization to accomplish load spreading. While that would usually work fine, there still is a chance of it doing everything around the same time, potentially causing problems that are hard to track down.

A more deterministic approach is a lot less trivial to implement, though. Hmmm. I'm going to think about this some more.

yvanoers commented 3 years ago

I came across an implementation that may be useful: The 'H' as used in Jenkins. See the H explanation in the Cron article on Wikipedia. See its referenced source for additional info on the syntax and possibilities in Jenkins.

Relevant bit from Wikipedia:

'H' is used in the Jenkins continuous integration system to indicate that a "hashed" value is substituted. Thus instead of a fixed number such as '20 ' which means at 20 minutes after the hour every hour, 'H ' indicates that the task is performed every hour at an unspecified but invariant time for each task. This allows spreading out tasks over time, rather than having all of them start at the same time and compete for resources.

I think that would be a great solution to this problem. What are your thoughts?

alexanderfefelov commented 3 years ago

What are your thought?

Sounds very interesting. Thank you.

yvanoers commented 3 years ago

@victorcoder What do you think? I'd love to build this but I'm only going to if this has a chance of getting merged.

vcastellm commented 3 years ago

This is pretty interesting but I think the work should be done in https://github.com/robfig/cron WDYT?

yvanoers commented 3 years ago

Ideally the feature should go in robfig/cron, absolutely agree. But considering there's a bunch of PRs for adding non-standard cron features (some of which could also benefit Dkron) sitting there for months without any response from robfig, I'm not very optimistic about a new PR getting merged any time soon.

We could make a PR for cron and then also build it into Dkron while waiting for the PR get merged. That way if it ever gets merged it would likely be compatible and we could have the feature sooner rather than later.

vcastellm commented 3 years ago

Agree, go for it