NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
176 stars 95 forks source link

NCPA having high memory results and Solaris 11.4 server actual memory results is significantly lower than expected #1155

Open ongsunglau opened 2 months ago

ongsunglau commented 2 months ago

Hi Dev Team,

We noticed that 2.3.1 ncpa get different results of memory usage from actual server consumed by process.

What we can able to workaround or method to fix this?

root@server:~# /usr/local/ncpa/ncpa_passive --version
ncpa_passive version, 2.3.1
root@server:~# /usr/local/ncpa/ncpa_listener --version
ncpa_listener version, 2.3.1
root@server:~# cat /etc/os-release
NAME="Oracle Solaris"
PRETTY_NAME="Oracle Solaris 11.4"
CPE_NAME="cpe:/o:oracle:solaris:11:4"
ID=solaris
VERSION=11.4
VERSION_ID=11.4
BUILD_ID=11.4.36.0.1.101.2
HOME_URL="https://www.oracle.com/solaris/"
SUPPORT_URL="https://support.oracle.com/"
VARIANT_ID=sru
VARIANT="Support Update"

In prstat/top process results

 NPROC USERNAME     SWAP      RSS MEMORY      TIME    CPU
   **113 root     3784064K 2904992K 13.85% 749:59:03 0.095%
    65 proadm   8395072K 7794184K 37.17%  32:58:39 0.037%**
     2 _polkitd   13112K   25184K 0.120%   0:00:01 0.000%
     3 noaccess  750896K  748208K 3.568%   0:09:48 0.000%
     2 nagios     56080K   55768K 0.266%  21:42:05 0.000%

https://127.0.0.1:5693/gui/api results

https://127.0.0.1:5693/api/memory/virtual/percent?check=true
{
    "returncode": 0,
    "stdout": "OK: Percent was 83.40 % | 'percent'=83.40%;;;"
}

This is active check we are using

./check_ncpa.py -H 127.0.0.1 -t '<your token>' -M 'memory/virtual/percent'
Would result in the following output:

OK: Percent was 83.40 % | 'percent'=83.40%;;;
ne-bbahn commented 2 months ago

NCPA uses the PSUtil library to get system diagnostic information. If PSUtil gives incorrect information for whatever reason, then NCPA will also have incorrect information. Another possibility cause is that NCPA gets diagnostic information over a fairly short period of time at the moment, meaning spikes and valleys in usage can cause differing values, though I think that is not likely to be the case here.

As for a workaround, you could create a plugin that uses prstat/top to get your memory information and then access the plugins/myplugin.sh endpoint instead of memory/virtual/percent