Resuming jobs after node.js restart

Automattic / kue

Kue is a priority job queue backed by redis, built for node.js.

http://automattic.github.io/kue

MIT License

9.45k stars 865 forks source link

Resuming jobs after node.js restart #510

Open goodthing opened 9 years ago

goodthing commented 9 years ago

Question: I have a delayed job. Called by this code var job = jobs.create(jobName, data) .delay(time) .priority('low') .save(); jobs.promote();

Then after it successfully goes into redis, the node server restarts. The job never runs anymore but I checked and it does exist in the redis. How do I ensure that the job will be picked up?

FYI this is the redis entry: hgetall q:job:16 1) "updated_at" 2) "1423534315447" 3) "created_at" 4) "1423534315414" 5) "delay" 6) "5000000" 7) "state" 8) "delayed" 9) "priority" 10) "10" 11) "type" 12) "sendNotification" 13) "max_attempts" 14) "1" 15) "data" 16) " {}"

behrad commented 9 years ago

delayed jobs are resistant to restarts. are you calling promote() ?

goodthing commented 9 years ago

Do I have to call promote() after the node restart or how does it actually suppose to work? All I've seen in examples are calling promote after the job creation as per my code above.

behrad commented 9 years ago

promote is actually a daemon continuously checking delayed queue for job queuing time arrivals, so you should call promote on your node process startup (just once/in one process if you are using cluster) when you are using delayed jobs

davisford commented 9 years ago

@behrad what if i am not using delayed jobs? I am running 0.8.9 and I am seeing a scenario where the producer puts jobs in the kue and they get stuck there. I have jobs in both the queued state and the active state when I inspect.

It seems like the consumer loses the redis connection after some time (just guessing), and then jobs just keep piling up, and on a restart of the consumer, a huge flood of jobs comes in. Is there something I can do to make this more robust?

behrad commented 9 years ago

I have jobs in both the queued state and the active state when I inspect

then your client nodejs code is crashing, so active jobs have no chance to be committed via done(err) callback. when number of active jobs reach your concurrency limit of workers, jobs will be filling queued.

Generally you can handle this in your node.js process exit hooks like unhandleException, ... or by node.js domains. Relative details and code are available in kue issues...

It seems like the consumer loses the redis connection after some time

If so, your exit hook also will have no chance to access redis to fix this. Currently kue is based on client-side node.js job state management, this will be changed to server side lua scripts in version 1.0 of Kue that will be more consistent even in redis connection problems