Unitech / pm2

Node.js Production Process Manager with a built-in Load Balancer.
https://pm2.keymetrics.io/docs/usage/quick-start/
Other
41.31k stars 2.61k forks source link

Sudden PM2 core dump. Asking for assistance in finding the root cause, if possible. #5787

Open JuriStefanovski opened 5 months ago

JuriStefanovski commented 5 months ago

What's going wrong?

host syslog extract: Mar 15 03:30:10 Center1 systemd[1]: Started Process Core Dump (PID 3079673/UID 0). Mar 15 03:30:12 Center1 systemd-coredump[3079674]: Core file was truncated to 2147483648 bytes. Mar 15 03:30:17 Center1 systemd-coredump[3079674]: Process 391241 (PM2 v5.3.0: God) of user 1000 dumped core.#012#012Stack trace of thread 391241:#012#0 0x00007f204686a00b n/a (n/a + 0x0) Mar 15 03:30:17 Center1 systemd[1]: systemd-coredump@1-3079673-0.service: Succeeded. Mar 15 03:30:17 Center1 systemd[1]: pm2-user.service: New main PID 391249 does not belong to service, and PID file is not owned by root. Refusing. Mar 15 03:30:17 Center1 systemd[1]: pm2-user.service: Main process exited, code=dumped, status=6/ABRT Mar 15 03:30:17 Center1 systemd[1]: pm2-user.service: Failed with result 'core-dump'.

How could we reproduce this issue?

Not sure if this can be easily replicated, this PM2 worked great for a long time, but perhaps the information contained in the core dump could be of interest to developers? The core dump file is available, but its size is around 299 MB. Ready to upload/send it wherever you say, if necessary. Please close this case if it is not relevant to you. I apologize for any inconvenience.

Supporting information

Linux Mint 20.3 Una Linux Center1 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

PM2 report Date : Mon Mar 18 2024 14:16:49 GMT+0200 (Eastern European Standard Time)

Daemon pm2d version : 5.3.0 node version : 10.19.0 node path : not found argv : /usr/bin/node,/usr/local/lib/node_modules/pm2/lib/Daemon.js argv0 : node user : user uid : 1000 gid : 1000 uptime : 4747min

--- CLI ---------------------------------------------------- local pm2 : 5.3.0 node version : 10.19.0 node path : /usr/local/bin/pm2 argv : /usr/bin/node,/usr/local/bin/pm2,report argv0 : node user : user uid : 1000 gid : 1000

--- System info -------------------------------------------- arch : x64 platform : linux type : Linux cpus : Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz cpus nb : 12 freemem : 20717719552 totalmem : 33243213824 home : /home/user

--- PM2 list ----------------------------------------------- ┌────┬────────────────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐ │ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │ ├────┼────────────────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤ │ 2 │ db │ default │ N/A │ fork │ 10845 │ 3D │ 0 │ online │ 0.7% │ 68.8mb │ user │ disabled │ │ 3 │ line │ default │ N/A │ fork │ 10901 │ 3D │ 0 │ online │ 38.1% │ 88.2mb │ user │ disabled │ │ 4 │ linecleanup │ default │ N/A │ fork │ N/A │ 0 │ 0 │ stopped │ 0% │ 0b │ user │ disabled │ │ 1 │ registry_server │ default │ N/A │ fork │ 10821 │ 3D │ 5 │ online │ 1.1% │ 20.2mb │ user │ disabled │ └────┴────────────────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘ Module ┌────┬──────────────────────────────┬───────────────┬──────────┬──────────┬──────┬──────────┬──────────┬──────────┐ │ id │ module │ version │ pid │ status │ ↺ │ cpu │ mem │ user │ ├────┼──────────────────────────────┼───────────────┼──────────┼──────────┼──────┼──────────┼──────────┼──────────┤ │ 0 │ pm2-logrotate │ 2.7.0 │ 10694 │ online │ 0 │ 0.3% │ 100.0mb │ user │ └────┴──────────────────────────────┴───────────────┴──────────┴──────────┴──────┴──────────┴──────────┴──────────┘

--- Daemon logs -------------------------------------------- /home/user/.pm2/pm2.log last 20 lines: PM2 | 2024-03-15T07:09:07: PM2 log: Application log path : /home/user/.pm2/logs PM2 | 2024-03-15T07:09:07: PM2 log: Worker Interval : 30000 PM2 | 2024-03-15T07:09:07: PM2 log: Process dump file : /home/user/.pm2/dump.pm2 PM2 | 2024-03-15T07:09:07: PM2 log: Concurrent actions : 2 PM2 | 2024-03-15T07:09:07: PM2 log: SIGTERM timeout : 1600 PM2 | 2024-03-15T07:09:07: PM2 log:

PM2 | 2024-03-15T07:09:08: PM2 log: App [pm2-logrotate:0] starting in -fork mode- PM2 | 2024-03-15T07:09:08: PM2 log: App [pm2-logrotate:0] online PM2 | 2024-03-15T07:09:12: PM2 log: App [registry_server:1] starting in -fork mode- PM2 | 2024-03-15T07:09:12: PM2 log: App [registry_server:1] online PM2 | 2024-03-15T07:09:13: PM2 log: App [db:2] starting in -fork mode- PM2 | 2024-03-15T07:09:13: PM2 log: App [db:2] online PM2 | 2024-03-15T07:09:14: PM2 log: App [line:3] starting in -fork mode- PM2 | 2024-03-15T07:09:14: PM2 log: App [line:3] online PM2 | 2024-03-16T07:09:08: PM2 log: [PM2] This PM2 is not UP TO DATE PM2 | 2024-03-16T07:09:08: PM2 log: [PM2] Upgrade to version 5.3.1 PM2 | 2024-03-17T07:09:08: PM2 log: [PM2] This PM2 is not UP TO DATE PM2 | 2024-03-17T07:09:08: PM2 log: [PM2] Upgrade to version 5.3.1 PM2 | 2024-03-18T07:09:08: PM2 log: [PM2] This PM2 is not UP TO DATE PM2 | 2024-03-18T07:09:08: PM2 log: [PM2] Upgrade to version 5.3.1

ultimate-tester commented 5 months ago

The first thing that comes to mind is that your system was out of memory, could that have been the case?

JuriStefanovski commented 5 months ago

I looked at the statistics of the monitoring system - at the time of the crash, ~40% of 32 GB of RAM remained free in the system.

Update: Having studied the statistics a little more carefully, I noticed that about two days before the failure, memory consumption began to increase, slowly but surely, from the usual ~48% to 58% It looks like a memory leak, but it is not possible to identify the culprit process; my monitoring, unfortunately, is very basic.

arasmussen commented 2 months ago

Running into the same / a similar issue. Also posted an issue here: https://github.com/Unitech/pm2/issues/5797