Automattic / kue

Kue is a priority job queue backed by redis, built for node.js.
http://automattic.github.io/kue
MIT License
9.46k stars 867 forks source link

Workers error after shutdown #1144

Open dindurthy opened 6 years ago

dindurthy commented 6 years ago

During peak times, we routinely see workers error during shutdown. What appears to occur is this:

It's almost certainly a race condition. One issue I see is in worker.shutdown:

  if( !this.running ) return _fn();
  this.running = false;

  // As soon as we're free, signal that we're done
  if( !this.job ) {
    return _fn();
  }

Here, shutdown sets the worker state to running = false, and if the worker has not obtained a job, we call the callback. If the worker is in the middle of self.getJob, then kue.shutdown may proceed before the worker actually becomes idle. Instead, the worker will continue to obtain it's job, process it, and then go idle once it checks it's running state. Easy race condition.

Not exactly sure what a solution should be. One idea is a worker records a state of fetching a job and refers to that in worker.shutdown.