PM2 cluster processes not exiting automatically, possible leak in event pool

OpenGG commented 7 years ago

What's going wrong?

When a process is running in PM2 cluster mode and triggered a graceful reload, it has to call process.exit(0) on SIGINT, other it won't exit and PM2 will wait 1600ms then kill it.

When running in fork mode, or running without PM2, the process will achieve graceful reloading without explicitly calling process.exit().

Calling process.exit() explicitly covers up the real cause, and may lead to new problems, especially the operations related to IOs, e.g. partial writes.

Obviously something is preventing clustered process from automatically exiting.

How could we reproduce this issue?

I setup a repo, with the simplest possible Node.js server script, with full test scripts included.

You can clone this repo, run test.sh and test-pm2.sh to find out, or you can checkout how it works across different Node.js versions (Hint: PM2 failing on all major Node.js verions).

Possible cause

After a little digging, I figure out the possible reason: there is a leaky handler in the event pool of clustered process.

I don't know why or how this leaky handler works, but I come up with a dirty fix:

process.on('SIGINT', function () {
  // My process has received a SIGINT signal
  // Meaning PM2 is now trying to stop the process

  // So I can clean some stuff before the final stop
  console.log('Receiving SIGINT, closing server');

  server.close(function () {
    // dirty fix
    var handles = process._getActiveHandles();
    // var requests = process._getActiveRequests();

    handles.forEach(function (handle) {
      handle.close();
      //console.log('handle', getAllProps(handle));
    });
  });

  setTimeout(function () {
    // 300ms later the process kill it self to allow a restart
    console.error('Node.js process not exited, set exit code to 1');

    process.exitCode = 1;
  }, 1000).unref();
});

But this is quite dirty:

Apps don't know anything about the underlying handler, and should not close it.
Both the _getActiveHandles() method and the leaky handler itself is not recognizable nor documented, so this hack is not stable and should be avoided.
There may be other handlers inside event pool, some added by app, some added by deps, each of them should be properly close by its creator, and anything goes against this principle is a bug and should be fixed. This dirty hack again covers up the real problems.

I think it is the best that PM2 guys can help hunt down the leak and figure out a way to do with it.

Supporting information

PM2 version: 2.5.0
Node version: All of them
OS: Mac and Linux.

Unitech commented 7 years ago

Do not use gracefulReload, this command is deprecated. Use pm2 restart/reload instead
Processes in cluster mode does not get restarted automatically when the event loop is empty, it's because there are still internal mechanisms running behind the scene to handle clustering lb

OpenGG commented 7 years ago

@Unitech

Now using reload instead of gracefulReload, still failing.
PM2 cluster processes connecting to PM2 god process via IPC, maybe that's why explicit process.exit() is required?
Can PM2 provide an API for clustered processes to disconnect this IPC?

I have implemented a basic PM2_DISCONNECT(). It should disconnect the IPC connection, and undoing all process.send() related work in ProcessContainer.js.

When apps dealing with graceful shutdown/reload, calling a PM2_DISCONNECT() is better than process.exit(). process.exit() can lead to premature exit.

/*
 * Somewhere in the app, this mysql connection has setup
 * and it should be dealt with during graceful shutdown.
 */
var mysql = mysql.connect();

// server should be close during graceful shutdown, too.
server.listen(port);

// Graceful shutdown on SIGINT
process.on('SIGINT', function () {

  // Closing server
  server.close(function () {
    /*
     * Process exits right away without any error.
     *
     * Even though developer forgets about the mysql connection,
     * but since no one complains, he will never have the chance
     * to fix it.
     *
     */
    process.exit(0);
  });
});

/*
 * Somewhere in the app, this mysql connection has setup
 * and it should be dealt with during graceful shutdown.
 */
var mysql = mysql.connect();

// server should be close during graceful shutdown, too.
server.listen(port);

// Graceful shutdown on SIGINT
process.on('SIGINT', function () {

  // Closing server
  server.close(function () {
    /*
     * IPC channel disconnected, process will exit when the event
     * loop is empty.
     *
     * Developer forgets to close the mysql connection, so this process
     * will never exit. Then PM2 daemon has to force close it, leaving
     * an error in logs, so that he has the chance to notice and fix it.
     *
     */
    PM2_DISCONNECT(0);
  });
});

guyellis commented 7 years ago

@Unitech you said:

Do not use gracefulReload, this command is deprecated. Use pm2 restart/reload instead

I see that at one point gracefulReload had a console warning if used but that's subsequently been commented out around here:

  //Common.printOut(conf.PREFIX_MSG_WARNING + chalk.bold.yellow('Warning gracefulReload will be soon deprecated'));
  //Common.printOut(conf.PREFIX_MSG_WARNING + chalk.bold.yellow('Use http://pm2.keymetrics.io/docs/usage/signals-clean-restart/ instead'));

I can't find anymore information on this. Is gracefulReload deprecated or not?

rogerweb commented 5 years ago

Facing the same issue. I don't want to explicitly call process.exit(0). Any progress on this issue?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Kit-p commented 1 year ago

Here is a minimal application that achieves the desired behavior for me (natural exit without calling process.exit(0)).

let work;

function app() {
  work = setInterval(() => { // do some work
    process.stdout.write('.');
  }, 500);
}

function gracefulShutdown() {
  clearInterval(work);
  process.channel?.unref(); // <-- the trick
}

process.on('SIGINT', gracefulShutdown);
process.on('SIGTERM', gracefulShutdown);
process.on('message', (msg) => {
  if (msg === 'shutdown') gracefulShutdown();
});

app(); // run the app

@Unitech Do you think anything can potentially go wrong with process.channel.unref()? Here is the Node.js documentation: https://nodejs.org/docs/latest-v18.x/api/process.html#processchannelunref

Unitech / pm2

PM2 cluster processes not exiting automatically, possible leak in event pool #2967

Supporting information