Automattic / kue

Kue is a priority job queue backed by redis, built for node.js.
http://automattic.github.io/kue
MIT License
9.45k stars 862 forks source link

Delayed job processed more than once #1133

Open jkrenge opened 6 years ago

jkrenge commented 6 years ago

It's a bit hard to reproduce, and I'm not 100% sure what to share exactly.

Setup: We have one large redis in the middle, and a variety of different queues and workers connected. Each queue is being fed by at least 10 workers, and processed by another 10.

Now, I do have one specific type of job, that's enqueued with a delay (e.g. 70 hours). This job is also a bit longer, takes on average 1.8 seconds. And this is regularly being processed twice in parallel, by different workers, maybe even on different machines.

For example, I log the start of the job and got:

2017-11-16 10:56:12.733 - Job 2494621079...
2017-11-16 10:56:12.180 - Job 2494621079...

Is there anything I can do about this, or are we just cratching the top-line and may need to move to another queue system?

behrad commented 6 years ago

There should be race somewhere like here: https://github.com/Automattic/kue/blob/master/lib/kue.js#L168 that us causing a delayed job to duplicated in inactive state

jkrenge commented 6 years ago

That does make sense. So the job promotion is checked twice — by different workers — and both of them deem the job ready to be processed, and change its state to inactive. Then, I do have the same job twice in my inactive queue?

behrad commented 6 years ago

May be, however we have a lock there... but there should be a leakage! We should harden job promotion logic in v1 branch, or even make it a lua script if possible

jkrenge commented 6 years ago

As a temporary fix, is there maybe a way to wait within the worker, and re-check whether the job with this id exists twice, and then abort if it's not the job instance with the lowest timestamp?