Unitech / pm2

Node.js Production Process Manager with a built-in Load Balancer.
https://pm2.keymetrics.io/docs/usage/quick-start/
Other
41.31k stars 2.61k forks source link

Graviton3 support – intermittent crash/coredump #5797

Open arasmussen opened 5 months ago

arasmussen commented 5 months ago

What's going wrong?

How could we reproduce this issue?

Supporting information

--- PM2 report ----------------------------------------------------------------
Date                 : Mon Apr 08 2024 18:32:37 GMT+0000 (Coordinated Universal Time)
===============================================================================
--- Daemon -------------------------------------------------
pm2d version         : 5.3.0
node version         : 18.17.0
node path            : /home/ec2-user/.nvm/versions/node/v18.17.0/bin/pm2
argv                 : /home/ec2-user/.nvm/versions/node/v18.17.0/bin/node,/home/ec2-user/.nvm/versions/node/v18.17.0/lib/node_modules/pm2/lib/Daemon.js
argv0                : node
user                 : ec2-user
uid                  : 1000
gid                  : 1000
uptime               : 57min
===============================================================================
--- CLI ----------------------------------------------------
local pm2            : 5.3.0
node version         : 18.17.0
node path            : /home/ec2-user/.nvm/versions/node/v18.17.0/bin/pm2
argv                 : /home/ec2-user/.nvm/versions/node/v18.17.0/bin/node,/home/ec2-user/.nvm/versions/node/v18.17.0/bin/pm2,report
argv0                : node
user                 : ec2-user
uid                  : 1000
gid                  : 1000
===============================================================================
--- System info --------------------------------------------
arch                 : arm64
platform             : linux
type                 : Linux
cpus                 : unknown
cpus nb              : 4
freemem              : 14946652160
totalmem             : 16449142784
home                 : /home/ec2-user
===============================================================================
--- PM2 list -----------------------------------------------
┌────┬───────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name      │ namespace   │ version │ mode    │ pid      │ uptime │ ↺    │ status    │ cpu      │ mem      │ user     │ watching │
└────┴───────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘
===============================================================================
--- Daemon logs --------------------------------------------
/home/ec2-user/.pm2/pm2.log last 20 lines:
PM2        | 2024-04-08T17:26:16: PM2 log: pid=149369 msg=failed to kill - retrying in 100ms
PM2        | 2024-04-08T17:26:16: PM2 log: pid=137480 msg=failed to kill - retrying in 100ms
PM2        | 2024-04-08T17:26:16: PM2 log: Process with pid 149369 still alive after 30000ms, sending it SIGKILL now...
PM2        | 2024-04-08T17:26:17: PM2 log: Process with pid 137480 still alive after 30000ms, sending it SIGKILL now...
PM2        | 2024-04-08T17:34:44: PM2 log: ===============================================================================
PM2        | 2024-04-08T17:34:45: PM2 log: --- New PM2 Daemon started ----------------------------------------------------
PM2        | 2024-04-08T17:34:45: PM2 log: Time                 : Mon Apr 08 2024 17:34:45 GMT+0000 (Coordinated Universal Time)
PM2        | 2024-04-08T17:34:45: PM2 log: PM2 version          : 5.3.0
PM2        | 2024-04-08T17:34:45: PM2 log: Node.js version      : 18.17.0
PM2        | 2024-04-08T17:34:45: PM2 log: Current arch         : arm64
PM2        | 2024-04-08T17:34:45: PM2 log: PM2 home             : /home/ec2-user/.pm2
PM2        | 2024-04-08T17:34:45: PM2 log: PM2 PID file         : /home/ec2-user/.pm2/pm2.pid
PM2        | 2024-04-08T17:34:45: PM2 log: RPC socket file      : /home/ec2-user/.pm2/rpc.sock
PM2        | 2024-04-08T17:34:45: PM2 log: BUS socket file      : /home/ec2-user/.pm2/pub.sock
PM2        | 2024-04-08T17:34:45: PM2 log: Application log path : /home/ec2-user/.pm2/logs
PM2        | 2024-04-08T17:34:45: PM2 log: Worker Interval      : 30000
PM2        | 2024-04-08T17:34:45: PM2 log: Process dump file    : /home/ec2-user/.pm2/dump.pm2
PM2        | 2024-04-08T17:34:45: PM2 log: Concurrent actions   : 2
PM2        | 2024-04-08T17:34:45: PM2 log: SIGTERM timeout      : 1600
PM2        | 2024-04-08T17:34:45: PM2 log: ===============================================================================
arasmussen commented 5 months ago

Just migrated our instances to Graviton2 (from m7g to m6g) and confirmed we are not able to reproduce this issue there.

arasmussen commented 2 months ago

Coredump backtrace:

(gdb) bt
#0  0x0000000000d61274 in v8::Object::Set(v8::Local<v8::Context>, v8::Local<v8::Value>, v8::Local<v8::Value>) ()
#1  0x0000000000c4df44 [PAC] in node::(anonymous namespace)::ProcessWrap::Spawn(v8::FunctionCallbackInfo<v8::Value> const&) ()
#2  0x0000000000dab0e0 [PAC] in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments)
    ()
#3  0x0000000000dac208 [PAC] in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) ()
#4  0x000000000168000c [PAC] in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit ()
#5  0x005600000168000c in ?? ()

I have tried downgrading to pm2 5.3.1 and 5.1.1 and ran into the same issue.

I also tried downgrading to node 18.12.1. Upgrading to node 20 appears to fix.