Automattic / kue

Kue is a priority job queue backed by redis, built for node.js.
http://automattic.github.io/kue
MIT License
9.46k stars 867 forks source link

Inconsistencies in job's state #1204

Open finalclass opened 5 years ago

finalclass commented 5 years ago

kue version: 0.11.6

I'm experiencing a weird phenomena with some of our jobs. I have jobs in {q}:jobs:active ZSET that have their state set to failed. I've tried to figure out how this is possible but I couldn't. My first suspect was that there was some external restart of the process during the job.state() function but the MUTLI is used there so it shouldn't cause any inconsistencies.

There is this queue.checkActiveJobTtl() mechanism that runs every second and in our case on some events we have a lot of these inconsistent jobs and these get processed every second which is causing an unnecessary load on our servers.

The simplest solution would be to add:

job._state = 'active';

here: https://github.com/Automattic/kue/blob/master/lib/kue.js#L245 however on one server I've noticed that we have inconsistency with jobs in the "incative" box (these are in inactive ZSET but their state is set to "failed")

finalclass commented 5 years ago

Finally I know what's the problem.

So it's the refreshTtl function that is putting these jobs back to active list: https://github.com/Automattic/kue/blob/master/lib/queue/job.js#L346

This refreshTtl function is called when progress is set. The thing is that we don't always wait for the progress callback to be called

So from time to time, a job finishes but later the progress (thus refreshTtl) runs and it adds the job back to active zset.

finalclass commented 5 years ago

Unfortunately refreshTtl and Job.prototype.progress do not accept callbacks so it's impossible to fix it on our side.