doxout / recluster

Node clustering library with support for zero downtime reloading
522 stars 44 forks source link

old workers never die. #1

Closed qzaidi closed 11 years ago

qzaidi commented 11 years ago

Hi,

When I signal the master using SIGUSR2, new workers are started but the old ones never die.

            var timeout = setTimeout(worker.kill.bind(worker), opt.timeout * 1000);
            worker.on('disconnect', clearTimeout.bind(this, timeout));
            // possible leftover worker that has no channel estabilished will throw
            try { worker.disconnect(); } catch (e) { }
            cluster.removeListener('listening', stopOld);

What I think is happening is that the worker.disconnect() call is generating a disconnect event that clears the timeout before it has a chance to execute. Am I reading this correct?

I am running node 0.10.0, so it could be that this event wasn't generated in your version of node?

spion commented 11 years ago

From what I've read in node's api documentation, the disconnect event may fire without the worker exiting if the worker is busy. But otherwise, worker.disconnect() only stops sending requests to the worker. The event disconnect is not emitted until all requests are served, right before the worker exits.

worker.disconnect()

When calling this function the worker will no longer accept new connections, but they will be handled by any other listening worker. Existing connection will be allowed to exit as usual. When no more connections exist, the IPC channel to the worker will close allowing it to die graceful. When the IPC channel is closed the disconnect event will emit, this is then followed by the exit event, there is emitted when the worker finally die.

I tried using the exit event instead. However, since I can't reproduce the issue in the test suite (the cluster module reports that the workers are always successfully killed), can you verify if this fix works for you?

qzaidi commented 11 years ago

This works - the workers are now cleaned up.

spion commented 11 years ago

Great, closing the issue. The fixed has been published on npm (0.1.5)