99designs / cmdstalk

beanstalkd broker; run jobs as unix commands.
http://godoc.org/github.com/99designs/cmdstalk
MIT License
75 stars 15 forks source link

WIP: ability to kick failed jobs back onto queue. #3

Closed pda closed 7 years ago

pda commented 10 years ago

Background: https://github.com/99designs/cmdstalk/issues/2

TL;DR: kicking failed jobs causes them to be instantly re-buried based on their stats.

This change means each kick buys a job one more release or timeout before it's auto-buried.

/cc @lwc.


For reference, here's a sample stats-job output:


---
id: 3
tube: cmdstalk-test-465396aa983f063a
state: buried
pri: 10
age: 34
delay: 0
ttr: 1
time-left: 0
file: 0
reserves: 2
timeouts: 1
releases: 0
buries: 1
kicks: 0
pda commented 10 years ago

Scenario with job timing out:

Fresh job: kicks: 0, releases: 0, timeouts: 0 Job takes too long: kicks: 0, releases: 0, timeouts: 1 (auto-buried on next reserve) Job kicked: kicks: 1, releases: 0, timeouts: 1 Now (timeouts - kicks) >= 1 so it'll process. Job times out again: kicks: 1, releases: 0, timeouts: 2 Kick again: kicks: 2, releases: 0, timeouts: 2 Again (timeouts - kicks) < 1 so it'll process.

pda commented 10 years ago

Scenario with job failing:

Fresh job: kicks: 0, releases: 0, timeouts: 0 Repeated exit(1): kicks: 0, releases: 1, timeouts: 0 kicks: 0, releases: 2, timeouts: 0kicks: 0, releases: 10, timeouts: 0 (auto-buried on next reserve) Job kicked: kicks: 1, releases: 10, timeouts: 0 Now (releases - kicks) < 10 so it'll process. Fails again: kicks: 1, releases: 11, timeouts: 0 (auto-buried on next reserve) Kicked again: kicks: 2, releases: 11, timeouts: 0 Again (releases - kicks) < 10 so it'll process.

lwc commented 10 years ago

So, the only weird case here is if a task has been buried a few times because it was failing, and it then times out:

New job: kicks: 0, timeouts: 0, releases: 0

Fails until buried: kicks: 0, timeouts: 0, releases: 1kicks: 0, timeouts: 0, releases: 10

Is kicked: kicks: 1, timeouts: 0, releases: 10

Fails again, kicked again: kicks: 2, timeouts: 0, releases: 11

It now times out: kicks: 2, timeouts: 1, releases: 11

Reserved again: (releases - kicks) = 9 so it doesn't get buried for failures, correct. (timeouts - kicks) = -1 so it doesn't get buried for timeout.

Job runs, eventually either timeouts or releases reaches the threshold.

Is this even a big deal though?


(edited by @pda)

pda commented 10 years ago

(timeouts - kicks) = -1

Side-note: if we use this approach, some things need to be treated as signed integers.