Closed pda closed 7 years ago
Scenario with job timing out:
Fresh job:
kicks: 0, releases: 0, timeouts: 0
Job takes too long:
kicks: 0, releases: 0, timeouts: 1
(auto-buried on next reserve)
Job kicked:
kicks: 1, releases: 0, timeouts: 1
Now (timeouts - kicks) >= 1
so it'll process.
Job times out again:
kicks: 1, releases: 0, timeouts: 2
Kick again:
kicks: 2, releases: 0, timeouts: 2
Again (timeouts - kicks) < 1
so it'll process.
Scenario with job failing:
Fresh job:
kicks: 0, releases: 0, timeouts: 0
Repeated exit(1)
:
kicks: 0, releases: 1, timeouts: 0
kicks: 0, releases: 2, timeouts: 0
…
kicks: 0, releases: 10, timeouts: 0
(auto-buried on next reserve)
Job kicked:
kicks: 1, releases: 10, timeouts: 0
Now (releases - kicks) < 10
so it'll process.
Fails again:
kicks: 1, releases: 11, timeouts: 0
(auto-buried on next reserve)
Kicked again:
kicks: 2, releases: 11, timeouts: 0
Again (releases - kicks) < 10
so it'll process.
So, the only weird case here is if a task has been buried a few times because it was failing, and it then times out:
New job:
kicks: 0, timeouts: 0, releases: 0
Fails until buried:
kicks: 0, timeouts: 0, releases: 1
…
kicks: 0, timeouts: 0, releases: 10
Is kicked:
kicks: 1, timeouts: 0, releases: 10
Fails again, kicked again:
kicks: 2, timeouts: 0, releases: 11
It now times out:
kicks: 2, timeouts: 1, releases: 11
Reserved again:
(releases - kicks) = 9
so it doesn't get buried for failures, correct.
(timeouts - kicks) = -1
so it doesn't get buried for timeout.
Job runs, eventually either timeouts or releases reaches the threshold.
Is this even a big deal though?
(edited by @pda)
(timeouts - kicks) = -1
Side-note: if we use this approach, some things need to be treated as signed integers.
Background: https://github.com/99designs/cmdstalk/issues/2
TL;DR: kicking failed jobs causes them to be instantly re-buried based on their stats.
This change means each
kick
buys a job one morerelease
ortimeout
before it's auto-buried./cc @lwc.
For reference, here's a sample
stats-job
output: