Open jkrenge opened 6 years ago
There should be race somewhere like here: https://github.com/Automattic/kue/blob/master/lib/kue.js#L168 that us causing a delayed job to duplicated in inactive state
That does make sense. So the job promotion is checked twice — by different workers — and both of them deem the job ready to be processed, and change its state to inactive
. Then, I do have the same job twice in my inactive queue?
May be, however we have a lock there... but there should be a leakage! We should harden job promotion logic in v1 branch, or even make it a lua script if possible
As a temporary fix, is there maybe a way to wait within the worker, and re-check whether the job with this id exists twice, and then abort if it's not the job instance with the lowest timestamp?
It's a bit hard to reproduce, and I'm not 100% sure what to share exactly.
Setup: We have one large redis in the middle, and a variety of different queues and workers connected. Each queue is being fed by at least 10 workers, and processed by another 10.
Now, I do have one specific type of job, that's enqueued with a delay (e.g. 70 hours). This job is also a bit longer, takes on average 1.8 seconds. And this is regularly being processed twice in parallel, by different workers, maybe even on different machines.
For example, I log the start of the job and got:
Is there anything I can do about this, or are we just cratching the top-line and may need to move to another queue system?