PM2 failed at detect cpu core count

0x11-dev commented 5 years ago

https://github.com/Unitech/pm2/blob/6090b0971abca6fcb2d796e560f2a72b81ab5707/lib/God.js#L19

IN Docker， We can set cpu limit to 2 with 48 phsic cpu core. When we set instances to 0, pm2 will fork 48 child. That exhausted Memory.

lscpu output:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0,1
Off-line CPU(s) list:  2-47
Thread(s) per core:    0
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               2501.586
BogoMIPS:              4394.69
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-11,24-35
NUMA node1 CPU(s):     12-23,36-47

cat /proc/cpuinfo output

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 79
model name  : Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping    : 1
microcode   : 0xb00002e
cpu MHz     : 2499.975
cache size  : 30720 KB
physical id : 0
siblings    : 24
core id     : 0
cpu cores   : 24
apicid      : 32
initial apicid  : 32
fpu     : yes
fpu_exception   : yes
cpuid level : 20
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts spec_ctrl intel_stibp flush_l1d
bogomips    : 4394.69
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 79
model name  : Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping    : 1
microcode   : 0xb00002e
cpu MHz     : 2499.975
cache size  : 30720 KB
physical id : 0
siblings    : 24
core id     : 1
cpu cores   : 24
apicid      : 34
initial apicid  : 34
fpu     : yes
fpu_exception   : yes
cpuid level : 20
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts spec_ctrl intel_stibp flush_l1d
bogomips    : 4394.69
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

wulfsolter commented 4 years ago

Still persistent

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

wulfsolter commented 4 years ago

Still persists

0x11-dev commented 4 years ago

Still persists

Unitech commented 4 years ago

https://github.com/nodejs/node/issues/28762 This looks like it will not fixed anytime soon Would you have any suggestions on getting the number of cpus assigned inside a container?

0x11-dev commented 4 years ago

pm2 should read /proc/cpuinfo file instead of require('os').cpus().length. This routine should always works, and would not be break up by possible changes would be taken by docker or linux container community.

Possible changes:

cputset related issue inside docker container can be solved using lxcfs ( even without patching docker project ) . See https://github.com/lxc/lxcfs#using-with-docker
https://github.com/nodejs/node/issues/28855

jdmarshall commented 4 years ago

Would it help a lot of people if PM2 did this work? Very much so. Would it help me? Definitely.

But would it help more people if we convinced the core team that NodeJS is 'doing it wrong'? I strongly believe it would. We shouldn't have people going around patching up bad info from NodeJS. Maybe NodeJS shouldn't even have to make this change, but that's a harder sell.

What useful purpose could a NodeJS app have for os.cpus() if it's reporting the data for the host VM, not the container? I'm struggling to think of anything actionable you could do with information about processors you cannot use.

0x11-dev commented 4 years ago

Change os.cpus() behavior may introduce more bug to Node.JS community.

abhijatyatewari commented 3 years ago

-i <number of workers> will tell PM2 that you want to launch your app in cluster_mode (as opposed to fork_mode).

If ‘number of workers’ argument is 0, PM2 will automatically spawn as many workers as you have CPU cores.

jdmarshall commented 3 years ago

@abhijatyatewari That’s what the feature is supposed to do, yes, but that’s not how containers work. Which is why there’s an issue. Physical cores tells you nothing about the cgroup you’re in.

And I don’t buy the “one platform” argument when it’s the platform for running in production. All these tools we make and use are pointless if we can’t run things well for our customers.

lujjjh commented 2 years ago

I created a package called node-cpu-count, a container-friendly alternative to os.cpus().length. Hope it helps.

jdmarshall commented 2 years ago

nproc gets closer to the correct answer, but does not exist on OSX. This works similarly:

sysctl -n hw.ncpu

Unitech / pm2

PM2 failed at detect cpu core count #4347