Unitech / pm2

Node.js Production Process Manager with a built-in Load Balancer.
https://pm2.keymetrics.io/docs/usage/quick-start/
Other
41.58k stars 2.62k forks source link

process.argv behaves differently for pm2-runtime + cluster mode without pm2 daemon running #4950

Open rocwind opened 3 years ago

rocwind commented 3 years ago

What's going wrong?

execute pm2-runtime start process.json without a running pm2 daemon, "start", "process.json" is included in application process.argv.

How could we reproduce this issue?

  1. create an app.js with contents below:
    console.log(process.argv)
  2. create a process.json with:
    {
    "apps": [{
    "name": "test-server",
    "script": "app.js",
    "instances": "1",
    "exec_mode": "cluster"
    }]
    }
  3. make sure pm2 daemon not running and execute pm2-runtime start process.json, it logs something like:
    2021-01-01T17:29:41: PM2 log: Launching in no daemon mode
    2021-01-01T17:29:41: PM2 log: App [test-server:0] starting in -cluster mode-
    2021-01-01T17:29:41: PM2 log: App [test-server:0] online
    [
    '/usr/local/bin/node',
    '/usr/local/lib/node_modules/pm2/lib/ProcessContainer.js',
    'start',
    'process.json'
    ]
  4. execute pm2 ps && pm2-runtime start process.json and it logs something like:
    
    [PM2] Spawning PM2 daemon with pm2_home=/root/.pm2
    [PM2] PM2 Successfully daemonized
    ┌────┬────────────────────┬──────────┬──────┬───────────┬──────────┬──────────┐
    │ id │ name               │ mode     │ ↺    │ status    │ cpu      │ memory   │
    └────┴────────────────────┴──────────┴──────┴───────────┴──────────┴──────────┘
    [
    '/usr/local/bin/node',
    '/usr/local/lib/node_modules/pm2/lib/ProcessContainer.js'
    ]
`pm2 ps` is just for spawning the daemon process.

## Supporting information
`pm2`, `pm2-runtime` + `fork` mode both works fine, it just the `pm2-runtime`.

--- PM2 report ---------------------------------------------------------------- Date : Fri Jan 01 2021 17:35:17 GMT+0800 (Central Standard Time)

--- Daemon ------------------------------------------------- pm2d version : 4.5.0 node version : 14.15.1 node path : not found argv : /usr/local/bin/node,/usr/local/lib/node_modules/pm2/lib/Daemon.js argv0 : node user : undefined uid : 0 gid : 0 uptime : 0min

--- CLI ---------------------------------------------------- local pm2 : 4.5.0 node version : 14.15.1 node path : not found argv : /usr/local/bin/node,/usr/local/bin/pm2,report argv0 : node user : undefined uid : 0 gid : 0

--- System info -------------------------------------------- arch : x64 platform : linux type : Linux cpus : Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz cpus nb : 2 freemem : 1295839232 totalmem : 2612682752 home : /root

--- PM2 list ----------------------------------------------- ┌────┬────────────────────┬──────────┬──────┬───────────┬──────────┬──────────┐ │ id │ name │ mode │ ↺ │ status │ cpu │ memory │ └────┴────────────────────┴──────────┴──────┴───────────┴──────────┴──────────┘

--- Daemon logs -------------------------------------------- /root/.pm2/pm2.log last 20 lines: PM2 | 2021-01-01T17:32:39: PM2 log: App name:test-server id:0 disconnected PM2 | 2021-01-01T17:32:39: PM2 log: App [test-server:0] exited with code [0] via signal [SIGINT] PM2 | 2021-01-01T17:32:39: PM2 log: pid=837 msg=process killed PM2 | 2021-01-01T17:32:40: PM2 log: PM2 successfully stopped PM2 | 2021-01-01T17:35:17: PM2 log: =============================================================================== PM2 | 2021-01-01T17:35:17: PM2 log: --- New PM2 Daemon started ---------------------------------------------------- PM2 | 2021-01-01T17:35:17: PM2 log: Time : Fri Jan 01 2021 17:35:17 GMT+0800 (Central Standard Time) PM2 | 2021-01-01T17:35:17: PM2 log: PM2 version : 4.5.0 PM2 | 2021-01-01T17:35:17: PM2 log: Node.js version : 14.15.1 PM2 | 2021-01-01T17:35:17: PM2 log: Current arch : x64 PM2 | 2021-01-01T17:35:17: PM2 log: PM2 home : /root/.pm2 PM2 | 2021-01-01T17:35:17: PM2 log: PM2 PID file : /root/.pm2/pm2.pid PM2 | 2021-01-01T17:35:17: PM2 log: RPC socket file : /root/.pm2/rpc.sock PM2 | 2021-01-01T17:35:17: PM2 log: BUS socket file : /root/.pm2/pub.sock PM2 | 2021-01-01T17:35:17: PM2 log: Application log path : /root/.pm2/logs PM2 | 2021-01-01T17:35:17: PM2 log: Worker Interval : 30000 PM2 | 2021-01-01T17:35:17: PM2 log: Process dump file : /root/.pm2/dump.pm2 PM2 | 2021-01-01T17:35:17: PM2 log: Concurrent actions : 2 PM2 | 2021-01-01T17:35:17: PM2 log: SIGTERM timeout : 1600 PM2 | 2021-01-01T17:35:17: PM2 log: ===============================================================================


<!--
Please run the following command (available on PM2 >= 2.6)
-->

$ pm2 report

mattpr commented 1 year ago

Another data point on this...

nextjs crashes due to: Unknown or unexpected option: --no-daemon

when running nextjs via pm2 (pm2 5.2.2, node 16.15) also fails on ubuntu 20.04 LTS.

app.pm2.json

{
    "name"               : "app_name",
    "script"             : "/opt/app_dir/node_modules/next/dist/bin/next",
    "args"               : "start",
    "instances"          : "1",
    "exec_mode"          : "cluster",
    "cwd"                : "/opt/app_dir",
    "out_file"           : "/dev/null",
    "error_file"         : "/dev/null",
    "wait_ready"         : false,
    "listen_timeout"     : 5000,
    "kill_timeout"       : 30000,
    "max_restarts"       : 1000000,
    "restart_delay"      : 100,
    "max_memory_restart" : "1G",
    "watch"              : false
}

Run with:

HOME=/tmp node pm2 start --no-daemon /path/to/app.pm2.json

We are running the above and setting up the environment via a provisioned systemd service unit which is why we run with --no-daemon. Maybe most people are running pm2 by hand in production?

Added a console.log(process.argv); to the top of /opt/app_dir/node_modules/next/dist/bin/next to see what was going on and got the following...

[
    '/opt/nodejs/node-v16.15.0/bin/node',
    '/opt/nodejs/node-v16.15.0/lib/node_modules/pm2/lib/ProcessContainer.js',
    'start',
    '--no-daemon',
    '/path/to/app.pm2.json',
    'start'
]

As far as I can tell, this issue will break any node apps that rely on process.argv when running in pm2 in --no-daemon mode (e.g. when using custom systemd units).

mattpr commented 1 year ago

I thought I had solved it from some combination of setting additional env. But it was a just a coincidence. It doesn't look like environment variables has anything to do with this.

If I run pm2 ls first and then run pm2 start --no-daemon app.pm2.json the issue goes away but that isn't really a solution as instrumenting start/reload/restart in systemd reliably would be difficult.

According to the docs, args shouldn't be passed through to the script being run by pm2 unless they follow a --.

mattpr commented 1 year ago

Okay, so the problem is that the worker process is launched differently depending on whether a running daemon is detected running. The docs even hint at this:

Make sure you kill any PM2 instance before starting PM2 in no daemon mode (pm2 kill).

They mean kill any running Daemon (the "God" process). Running almost any pm2 command (like pm2 ls) will result in a God daemon starting in the current environment if one isn't already running (the configured .pm2 directory).

case: pre-existing Daemon ("God" process)

Basically, when the Client starts it does a pingDaemon to see if there is an alive Daemon process. If yes, it does a launchRPC and returns. So the whole no-daemon init code gets skipped. This is where the code paths diverge when a daemon is already running or not.

In this case the existing daemon process will field the RPC and does a prepare call to launch the missing application. Because this happens over RPC the original process.argv are not retained.

case: no Daemon found

In this case we hit the no-daemon init code.

If you have instances set or specified clustering mode, the script is ultimately started by cluster.fork in ClusterMode.

If you are not using clustered multi-instance mode, then see ForkMode which uses child_process.spawn.

The cluster is setup here. Basically it gets a default script to execute which is ProcessContainer.js. This guy just wraps our script as a ES or CJS module.

The thing to note here from the docs for cluster.settings is:

args <string[]> arguments passed to worker. Default: process.argv.slice(2)

args is NOT set by pm2 so in the default cluster.fork scenario, all of the original args (minus the first 2) will be sent along to the worker. This means the node and pm2 get chopped off the front and replaced with node and ProcessContainer but the rest of the original pm2 args stick around and get passed to the worker.

patching...

You can just set the default for the worker args on the cluster to be empty array. Any args specified in pm2 environment json will still get passed to the child worker.

If you are using the "Variadic" (--) feature to pass through args to the child then you might need to be a little fancier about which args you keep from process.argv.

Right before you cluster.fork

cluster.settings.args = [];  //  don't pass child args (any args from your pm2 json environment will get passed)

Or when cluster is initialized.

cluster.setupMaster({
  windowsHide: true,
  exec : path.resolve(path.dirname(module.filename), 'ProcessContainer.js'),
  args: []
});
chalermpong commented 1 year ago

Hi

I'm also facing this issue. I'm using pm2-runtime to start next.js server. I'm using pm2-runtime v5.2.0.

My config file is the following:

#app.yml
apps:
  - script: next
    args: start

I debug the process.argv when running pm2-runtime app.yml alone. There will be 4 items:

process.argv:  [
  '/Users/me/.nvm/versions/node/v18.16.0/bin/node',
  '/Users/me/.config/yarn/global/node_modules/pm2/lib/ProcessContainer.js',
  'app.yml',
  'start'
]

But if I run pm2 list first, then run pm2-runtime app.yml. There will be 3 items:

process.argv:  [
  '/Users/me/.nvm/versions/node/v18.16.0/bin/node',
  '/Users/me/.config/yarn/global/node_modules/pm2/lib/ProcessContainer.js',
  'start'
]

next.js searches command from process.argv. In the first case, it will incorrectly get app.yml as command.

mattpr commented 1 year ago

@chalermpong -- the open PR fixes this at least for no-daemon mode. The PR has been open for a long time without any response from Unitech, there are 35 open PRs. So I get the feeling this project isn't actively maintained, or at least they aren't interested in fixing or responding to known issues.

You can take my PR and make your own build, fix it yourself, or move away from using pm2. We are doing the latter because of a variety of issues running pm2 in a serious devops environment. It is great for developers that want to get a node app into "production" without knowing anything about running servers.

If you ask pm2 to just be a node process load balancer behind systemd without a global "god" process (daemon mode), then things seem to start falling apart. I'm guessing this is because the Unitech folks really don't test these edge cases very well because that isn't how they expect pm2 to be used by most folks.

We often have 5-10 different pm2 users/apps on a single box. Every app gets own user/home-dir/etc because there isn't any reason to give them all the same permissions and access to each others' data. Our developers/devops never interact with pm2 directly because they would likely accidentally try to run pm2 as their own user rather than the correct pm2 user for that particular app (permissions problems, missing ENV set by systemd, etc). Rather developers are limited to systemd restart|reload|start|stop <app-name> which ensures the correct home dir and user are used to keep pm2 happy. Even running something like pm2 ls appears to try and start a God process under the current user in the current user's home directory...so doing pm2 <any-command> directly is "dangerous" in production unless you are a single developer running all your apps under a single pm2 all running under your own user/login with pm2's config directory in your own user's home directory. Even if they run the right incantation to set the right pm2 HOME and user (e.g. HOME=/opt/node-pm2/data-puller sudo -E -u pm2-data-puller pm2 ls) there are still problems. For instance pm2 starting a god process automatically when running commands you expect to be "read" rather than "execute" (like ls) and the other issue is if you actually try to start the app this way, systemd won't know about it (breaks monitoring of systemd service states) and there may be required app environment that is set on the systemd unit that will be missing (this is a more generic mechanism for passing ENV than the pm2 ecosystem file which only supports pm2 apps).

For an alternative: you can run multiple instances of a single systemd service using an index number and having that index number passed into the unit file for instance to set/increment ports. Of course that only gets you so far, you also need a notion of health checks for your app, adding and removing individual app instances to a load balancer based on health checks, incremental rollout and validation of app deployments. However these topics are all quite individual to the app you are building and your devops tooling: what load balancer you use (LVS, HAProxy, nginx, ELB, hardware, etc), how you validate your apps health before passing more traffic. For us pm2 is just doing the node process load balancing (e.g. I have 2 cores and want 2 instances of the app running in round robin or with percentages of traffic). Doing deploys from CI assets, updating load balancer configs and doing health checks is just a small amount of scripting specific to your devops tools and nature of your apps.

Good luck.