centreon / centreon-plugins

Collection of standard plugins to discover and gather cloud-to-edge metrics and status across your whole IT infrastructure.
https://www.centreon.com
Apache License 2.0
310 stars 273 forks source link

[os::linux::local::plugin] --mode=process: memory usage wrong #4246

Closed joschi99 closed 1 year ago

joschi99 commented 1 year ago

latest plugins - monitoring the process memory usage returns wrong values:

 ./centreon_plugins.pl --plugin=os::linux::local::plugin --mode=process --hostname=127.0.0.1 --add-memory --filter-command=mysqld --ssh-username=admin --ssh-password=password --ssh-backend=libssh --add-cpu
OK: Process: [command => mysqld] [arg => /usr/sbin/mysqld] [state => S] duration: 3w 2d 16h 49m 7s, memory used: 3.61 GB, cpu usage: 0.25 % - Number of current processes: 1, memory used: 3.61 GB, cpu usage: 0.25 % | 'processes.total.count'=1;;;0; 'processes.memory.usage.bytes'=3880140800B;;;0; 'processes.cpu.utilization.percentage'=0.25%;;;0;
==> /proc/1632/stat <==
1632 (mysqld) S 1 1632 1632 0 -1 1077944576 159608 0 131 0 269508 282624 0 0 20 0 47 0 3523 3096289280 216375 18446744073709551615 1 1 0 0 0 0 552967 4096 26345 18446744073709551615 0 0 17 2 0 0 598 0 0 0 0 0 0 0 0 0 0

==> /proc/1632/statm <==
755930 216375 3682 4887 0 730925 0

command response: ==> /proc/1632/stat <==
1632 (mysqld) S 1 1632 1632 0 -1 1077944576 159608 0 131 0 269508 282624 0 0 20 0 47 0 3523 3096289280 216375 18446744073709551615 1 1 0 0 0 0 552967 4096 26345 18446744073709551615 0 0 17 2 0 0 598 0 0 0 0 0 0 0 0 0 0

==> /proc/1632/statm <==
755930 216375 3682 4887 0 730925 0

SNMP returns correct values:

centreon_plugins.pl --hostname=127.0.0.1 --snmp-community='public' --snmp-version=2c   --snmp-username='' --snmp-timeout= --authpassphrase='' --authprotocol= --privpassphrase='' --privprotocol= --plugin=os::linux::snmp::plugin --mode=processcount  --process-name='mysqld' --verbose --memory
OK: Number of current processes running: 1 - Total memory usage: 845.21 MB - Average memory usage: 845.21 MB | 'nbproc'=1;;;0; 'mem_total'=886272000B;;;0; 'mem_avg'=886272000.00B;;;0;
Process '1632' [memory: 845.21 MB] [status: runnable] [name: mysqld]

Top command: image

Summary

garnier-quentin commented 1 year ago

I use the statm proc file. If the value is wrong, do you have a suggestion for a file ?

joschi99 commented 1 year ago

A approach could be to use /proc/pid/status using VmRSS value:

cat /proc/1632/status
Name:   mysqld
Umask:  0006
State:  S (sleeping)
Tgid:   1632
Ngid:   0
Pid:    1632
PPid:   1
TracerPid:      0
Uid:    997     997     997     997
Gid:    995     995     995     995
FDSize: 8192
Groups: 995
VmPeak:  3088656 kB
VmSize:  3023720 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    865564 kB
VmRSS:    865500 kB
RssAnon:          850772 kB
RssFile:           14728 kB
RssShmem:              0 kB
VmData:  2923568 kB
VmStk:       132 kB
VmExe:     19548 kB
VmLib:     10428 kB
VmPTE:      2172 kB
VmSwap:        0 kB
Threads:        47
SigQ:   0/31174
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000087007
SigIgn: 0000000000001000
SigCgt: 00000001800066e9
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000000000004000
CapAmb: 0000000000000000
NoNewPrivs:     0
Seccomp:        0
Speculation_Store_Bypass:       vulnerable
Cpus_allowed:   f
Cpus_allowed_list:      0-3
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        525944
nonvoluntary_ctxt_switches:     98

SNMP value:

OK: Number of current processes running: 1 - Total memory usage: 845.21 MB - Average memory usage: 845.21 MB | 'nbproc'=1;;;0; 'mem_total'=886272000B;;;0; 'mem_avg'=886272000.00B;;;0;
Process '1632' [memory: 845.21 MB] [status: runnable] [name: mysqld]

Another aproach is to use still /proc/pid/statm utilizzando il valore "resident":

cat /proc/1632/statm
755930 216375 3682 4887 0 730925 0

resident pagesize = utilization 216.375 4.096 = 886.272.000 -> 845,21MB

As you can see both methods are considered "inacurate", but the actual values are completly wrong

https://man7.org/linux/man-pages/man5/proc.5.html VmRSS Resident set size. Note that the value here is the sum of RssAnon, RssFile, and RssShmem. This value is inaccurate; see /proc/[pid]/statm above.

joschi99 commented 1 year ago

Hi @garnier-quentin, did you need some other information?

joschi99 commented 1 year ago

are there some news? the values reported via SSH are completly wrong actually (not usable).

garnier-quentin commented 1 year ago

https://github.com/centreon/centreon-plugins/pull/4529