Automattic / kue

Kue is a priority job queue backed by redis, built for node.js.
http://automattic.github.io/kue
MIT License
9.46k stars 867 forks source link

calling done() in a setTimeout function make another job to change state in inactive mode or stucks in active mode #1057

Closed behroozshafiabadi closed 7 years ago

behroozshafiabadi commented 7 years ago

seems that happen! could you explain me why ? tnx to you behrad

ejoebstl commented 7 years ago

Could you add some minimal code to reproduce to this issue?

I think I'm having a similar problem on a production system, but I have not managed to reproduce locally.

TheGrandmother commented 7 years ago

Are you sure that the correct reference is being captured by the callback? Some code would be greatly appreciated.

behrad commented 7 years ago

I can't figure your issue out @behroozshafiabadi Can you please send us a code snip?

TheGrandmother commented 7 years ago

I think I'm actually stuck in the same situation. Not sure if it is due to setTimeout but I'm doing a lot of asynchronous stuff.

Although I'm a bit unsure why this would happen.

TheGrandmother commented 7 years ago

I think i might have some more information on this.

It appears that getJob starts to behave oddly. The client.blpop call on https://github.com/Automattic/kue/blob/master/lib/queue/worker.js#L271 starts returning false and the callback does not fire. Therefore no error gets reported, the function exits normally with no job picked up and the job remains stuck.

Why this is i do not know :/

TheGrandmother commented 7 years ago

I think i have made some more progress with this issue.

I think the problem might be. Apparently the problem is that the lock aquired by client.blpop never gets released.I think the isue is that when a worker dies it does not get shutdown properly. In my configuration the workers are running on different processes than the "master" queue.

@behrad Is there a way to shut down a single worker remotely? The problem is that AFAIK the queue.shutdown function will kill all the workers. I need the option to shutdown a single worker locally. I need to be able to do graceful shutdown for a specific worker.

behrad commented 7 years ago

You can use redis cli to monitor your BLPOP connection and make sure if it is released or not. Looks abit strange to me

TheGrandmother commented 7 years ago

@behrad I found the issue and it was a user problem in my case. I didn't use the shutdown method properly and what happened was that the BLPOP did not release the lock. When this happens the callback did not fire and all remaining active jobs just hung around forever.

behrad commented 7 years ago

closing for now, don't mind to reopen.