mem usage = -0.01 % - Githubissues

banholzer commented 8 years ago

Hi, since we updated to vsphere 6.0 we have the problem, that sometimes mem usage and cpu usage are critical with -0.01 %

Do you know the reason for this? I remember that I had the same issue some years ago with the "old" esxi nagios check. As far as I remember it had to do with timeouts.

Do you have any suggestions to get rid of these messages? (The ESXi servers are still at 5.5.)

Thanks

Notification Type: PROBLEM

Service: check_esx3_dc_host_mem_usage Host: (ESXi Host) Address: State: CRITICAL

Date/Time: Sun Apr 24 15:04:43 CEST 2016

Additional Info: P1.PL CRITICAL - mem usage=-0.01 %

BaldMansMojo commented 8 years ago

Which version of the SDK do you use?

banholzer commented 8 years ago

as far as I can tell it is 5.1.0

./esxcli --version

Script 'esxcli' version: 5.1.0

BaldMansMojo commented 8 years ago

I have overseen that you are using the embedded Perl interpreter (p1.pl). Don't do it. There were several problems reported with this old tiny little piece of scrap. I have disabled it completely because there were a lot of problems with other plugins in the past. The interpreter was developed in a time where servers were much more slower than today and Perl was also older. It is not a fully compatible Perl interpreter.

Another problem that may happen is that a test from the command line may success but from within Nagios it fails.

ant0nwax commented 6 years ago

Dear M.

we run into exactly the same problem with vcenter 5.5.0 and icinga

root@monitoringsrv ~ # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --datacenter vcenter.domain.tld --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring@localos --warning 80%
OK: CPU wait=-1.00 ms - CPU ready=-1.00 ms - CPU usage=-0.01%|'cpu_wait'=-1.00ms;80;90;; 'cpu_ready'=-1.00ms;80;90;; 'cpu_usage'=-0.01%;80;90;;
root@monitoringsrv ~ # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --datacenter vcenter.domain.tld --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring@localos --warning 80%
OK: CPU wait=2875588.00 ms - CPU ready=216653.00 ms - CPU usage=42.54%|'cpu_wait'=2875588.00ms;80;90;; 'cpu_ready'=216653.00ms;80;90;; 'cpu_usage'=42.54%;80;90;;
root@monitoringsrv ~ # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --datacenter vcenter.domain.tld --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring@localos --warning 80%
OK: CPU wait=2863061.00 ms - CPU ready=208022.00 ms - CPU usage=41.66%|'cpu_wait'=2863061.00ms;80;90;; 'cpu_ready'=208022.00ms;80;90;; 'cpu_usage'=41.66%;80;90;;
root@monitoringsrv ~ # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --datacenter vcenter.domain.tld --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring@localos --warning 80%
OK: CPU wait=-1.00 ms - CPU ready=-1.00 ms - CPU usage=-0.01%|'cpu_wait'=-1.00ms;80;90;; 'cpu_ready'=-1.00ms;80;90;; 'cpu_usage'=-0.01%;80;90;;
root@monitoringsrv ~ # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --datacenter vcenter.domain.tld --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring@localos --warning 80%
OK: CPU wait=-1.00 ms - CPU ready=-1.00 ms - CPU usage=-0.01%|'cpu_wait'=-1.00ms;80;90;; 'cpu_ready'=-1.00ms;80;90;; 'cpu_usage'=-0.01%;80;90;;
root@monitoringsrv ~ # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --datacenter vcenter.domain.tld --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring@localos --warning 80%
OK: CPU wait=2867205.00 ms - CPU ready=215818.00 ms - CPU usage=47.80%|'cpu_wait'=2867205.00ms;80;90;; 'cpu_ready'=215818.00ms;80;90;; 'cpu_usage'=47.80%;80;90;;
root@monitoringsrv ~ # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --datacenter vcenter.domain.tld --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring@localos --warning 80%
OK: CPU wait=2867205.00 ms - CPU ready=215818.00 ms - CPU usage=47.80%|'cpu_wait'=2867205.00ms;80;90;; 'cpu_ready'=215818.00ms;80;90;; 'cpu_usage'=47.80%;80;90;;

the negative check_vmware_esx result of -0.01% is intermittent at esxi host checks cpu, memory, io, network and so on, they are sometimes not present, sometimes they are, even with root on command line and icinga user, so we can exclude icinga from the problem
vcenter performance charts show no negative results and no gaps, so the negative values are NOT present inside the vcenter performance database

imho the error resides somewhere between vcenter and check_vmware_esx, we are able to open a support ticket at vmware, too, shall we proceed?

in order to sort things out i am willing to help with sharing all you need to know about our environment, without sharing corporate data in the internet.

BaldMansMojo commented 6 years ago

Sorry - I haven't seen this never before. Have you tried to check the host directly? As mentioned in the readme hosts and not moving resources should be monitored directly. Otherewise all your (running) systems will get an alarm when vcenter stops/crashes. it may also happen that you will have to much load on the vcenter. Only moving things like virtual machines, storage etc. should be monitored via vcenter. Try this. The mix of negative values and real values suggests that the data is not deliverd correctly by your vcenter.

So normally you install SDK, check it with the CLI tools from the SDK, implement the plugin and every thing is fine. I only process what i get from vcenter via API (SDK). So shit in - shit out.

Try first monitoring directly to exclude vcenter

ant0nwax commented 6 years ago

Thanks M.

I will try to configure everything with your recommendation anyway it looks like this is the best alternative for us since we do not really monitor vCenter stuff nor VMs just very few VMs and those we use SNMP on Linux or SNMP on Juniper vSRX Mostly we are interested in the hardware of the hosts, and actually we monitor them also via SNMP iDRAC it was just part of the icinga package that recommended using your scripts for vmware, thats how I came across that

I will write an update here about the results.

Have a good day.

On 23 January 2018 at 18:05, Martin Fuerstenau notifications@github.com wrote:

Sorry - I haven't seen this never before. Have you tried to check the host directly? As mentioned in the readme hosts and not moving resources should be monitored directly. Otherewise all your (running) systems will get an alarm when vcenter stops/crashes. it may also happen that you will have to much load on the vcenter. Only moving things like virtual machines, storage etc. should be monitored via vcenter. Try this. The mix of negative values and real values suggests that the data is not deliverd correctly by your vcenter.

So normally you install SDK, check it with the CLI tools from the SDK, implement the plugin and every thing is fine. I only process what i get from vcenter via API (SDK). So shit in - shit out.

Try first monitoring directly to exclude vcenter

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BaldMansMojo/check_vmware_esx/issues/90#issuecomment-359859947, or mute the thread https://github.com/notifications/unsubscribe-auth/Af7fzPepAREMXVY2fqEBn4pVNv33dSY7ks5tNhFfgaJpZM4IO1rb .

-- Szabó Adam Ügyvezető

Was Ist Das Kft. Dezső utca 4/A 1016 Budapest 0630/5547341 (telefonbeszélgetésre, kérem egyeztessünk időpontot emailben)

ant0nwax commented 6 years ago

Hi there,

I implemented now all checks without vcenter, check_vmware_esx connects directly to the esxi hosts with a single session for each esx host. we check cpu, io, mem, net, runtime-health, runtime-temp, services, volumes-local in 8 services. the result is similar intermittent to the vcenter approach the vcenter seems to generate a different response than the esx host in case of vcenter the response is "-0.01" in case of esxi host the reponse is "Not available"

root@monitoringsrv / # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring --warning 80%
UNKNOWN: CPU wait=Not available - CPU ready=Not available - CPU usage=Not available
root@monitoringsrv / # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring --warning 80%
OK: CPU wait=2839604.00 ms - CPU ready=232980.00 ms - CPU usage=49.13%|'cpu_wait'=2839604.00ms;80;90;; 'cpu_ready'=232980.00ms;80;90;; 'cpu_usage'=49.13%;80;90;;
root@monitoringsrv / # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring --warning 80%
OK: CPU wait=2830577.00 ms - CPU ready=227053.00 ms - CPU usage=48.36%|'cpu_wait'=2830577.00ms;80;90;; 'cpu_ready'=227053.00ms;80;90;; 'cpu_usage'=48.36%;80;90;;
root@monitoringsrv / # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring --warning 80%
UNKNOWN: CPU wait=Not available - CPU ready=Not available - CPU usage=Not available
root@monitoringsrv / # /usr/lib64/nagios/vmware/check_vmware_esx --critical 90% --host esxi01.domain.tld --password SECRET --select cpu --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring --warning 80%

For our case we get now a real alarm in icinga since the script returns an undefined output we got a lot of UNKNOWN PROBLEMS in icinga so i switched off temporarily the email sending of UNKNOWN status alarms. The -0.01 was less usable than this solution, since now we will have missing performance data and not false performance data that look like gaps

M, do you have an idea why any check of the ESX hosts sometimes returns a valid output and sometimes an invalid output? Maybe you know perl and VMware SDK better than me, I could only help on the esx side.

Thanks again

ant0nwax commented 6 years ago

Hi there again, I thought I update that there is still no improvement/reply to the last question in this thread:

root@monitoringsrv ~ # esxcli --version Script 'esxcli' version: 6.0.0

ESXi Hosts run 5.5U3-...A08

--

root@monitoringsrv ~ # sudo -u icinga /usr/lib64/nagios/vmware/check_vmware_esx --host=esxi04.domain.tld --password TOTALLYSECRET --select io --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring OK: I/O commands aborted=0 - I/O bus resets=0 - I/O read=71 KB/sec. - I/O read latency=0 ms - I/O write=136 KB/sec.I/O write latency=0 ms - I/O usage=1278 KB/sec. - I/O kernel latency=0 ms - I/O device latency=0 ms - I/O queue latency=0 ms - I/O total latency=0 ms|'io_aborted'=0;;;; 'io_busresets'=0;;;; 'io_read'=71KB;;;; 'io_read_latency'=0ms;;;; 'io_write'=136KB;;;; 'io_write_latency'=0ms;;;; 'io_usage'=1278KB;;; 'io_kernel_latency'=0ms;;;; 'io_device_latency'=0ms;;;; 'io_queue_latency'=0ms;;;; 'io_total_latency'=0ms;;;; root@monitoringsrv ~ # sudo -u icinga /usr/lib64/nagios/vmware/check_vmware_esx --host=esxi04.domain.tld --password TOTALLYSECRET --select io --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring OK: I/O commands aborted=0 - I/O bus resets=0 - I/O read=71 KB/sec. - I/O read latency=0 ms - I/O write=136 KB/sec.I/O write latency=0 ms - I/O usage=1278 KB/sec. - I/O kernel latency=0 ms - I/O device latency=0 ms - I/O queue latency=0 ms - I/O total latency=0 ms|'io_aborted'=0;;;; 'io_busresets'=0;;;; 'io_read'=71KB;;;; 'io_read_latency'=0ms;;;; 'io_write'=136KB;;;; 'io_write_latency'=0ms;;;; 'io_usage'=1278KB;;; 'io_kernel_latency'=0ms;;;; 'io_device_latency'=0ms;;;; 'io_queue_latency'=0ms;;;; 'io_total_latency'=0ms;;;; root@monitoringsrv ~ # sudo -u icinga /usr/lib64/nagios/vmware/check_vmware_esx --host=esxi04.domain.tld --password TOTALLYSECRET --select io --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring UNKNOWN: I/O commands aborted=Not available - I/O bus resets=Not available - I/O read=Not available - I/O read latency=Not available - I/O write=Not available - I/O write latency==Not available - I/O usage=Not available - I/O kernel latency=Not available - I/O device latency=Not available - I/O queue latency=Not available - I/O total latency=Not available root@monitoringsrv ~ # sudo -u icinga /usr/lib64/nagios/vmware/check_vmware_esx --host=esxi04.domain.tld --password TOTALLYSECRET --select io --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring UNKNOWN: I/O commands aborted=Not available - I/O bus resets=Not available - I/O read=Not available - I/O read latency=Not available - I/O write=Not available - I/O write latency==Not available - I/O usage=Not available - I/O kernel latency=Not available - I/O device latency=Not available - I/O queue latency=Not available - I/O total latency=Not available root@monitoringsrv ~ # sudo -u icinga /usr/lib64/nagios/vmware/check_vmware_esx --host=esxi04.domain.tld --password TOTALLYSECRET --select io --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring OK: I/O commands aborted=0 - I/O bus resets=0 - I/O read=106 KB/sec. - I/O read latency=0 ms - I/O write=561 KB/sec.I/O write latency=1 ms - I/O usage=1782 KB/sec. - I/O kernel latency=0 ms - I/O device latency=0 ms - I/O queue latency=0 ms - I/O total latency=0 ms|'io_aborted'=0;;;; 'io_busresets'=0;;;; 'io_read'=106KB;;;; 'io_read_latency'=0ms;;;; 'io_write'=561KB;;;; 'io_write_latency'=1ms;;;; 'io_usage'=1782KB;;; 'io_kernel_latency'=0ms;;;; 'io_device_latency'=0ms;;;; 'io_queue_latency'=0ms;;;; 'io_total_latency'=0ms;;;; root@monitoringsrv ~ # sudo -u icinga /usr/lib64/nagios/vmware/check_vmware_esx --host=esxi04.domain.tld --password TOTALLYSECRET --select io --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring OK: I/O commands aborted=0 - I/O bus resets=0 - I/O read=106 KB/sec. - I/O read latency=0 ms - I/O write=561 KB/sec.I/O write latency=1 ms - I/O usage=1782 KB/sec. - I/O kernel latency=0 ms - I/O device latency=0 ms - I/O queue latency=0 ms - I/O total latency=0 ms|'io_aborted'=0;;;; 'io_busresets'=0;;;; 'io_read'=106KB;;;; 'io_read_latency'=0ms;;;; 'io_write'=561KB;;;; 'io_write_latency'=1ms;;;; 'io_usage'=1782KB;;; 'io_kernel_latency'=0ms;;;; 'io_device_latency'=0ms;;;; 'io_queue_latency'=0ms;;;; 'io_total_latency'=0ms;;;; root@monitoringsrv ~ # sudo -u icinga /usr/lib64/nagios/vmware/check_vmware_esx --host=esxi04.domain.tld --password TOTALLYSECRET --select io --sessionfiledir /var/spool/icinga2/tmp --sslport 443 --timeout 90 --username monitoring OK: I/O commands aborted=0 - I/O bus resets=0 - I/O read=106 KB/sec. - I/O read latency=0 ms - I/O write=561 KB/sec.I/O write latency=1 ms - I/O usage=1782 KB/sec. - I/O kernel latency=0 ms - I/O device latency=0 ms - I/O queue latency=0 ms - I/O total latency=0 ms|'io_aborted'=0;;;; 'io_busresets'=0;;;; 'io_read'=106KB;;;; 'io_read_latency'=0ms;;;; 'io_write'=561KB;;;; 'io_write_latency'=1ms;;;; 'io_usage'=1782KB;;; 'io_kernel_latency'=0ms;;;; 'io_device_latency'=0ms;;;; 'io_queue_latency'=0ms;;;; 'io_total_latency'=0ms;;;;

ant0nwax commented 6 years ago

I tried to update perl 5.10 (2009) that is system perl in CentOS 6.9 to perl 5.28, but i could not make it, have you an easy way for me how to use a modern perl with this script? I tried 3 hours with @INC modules and did not succeed...

We will also soon update to ESXi 6.5, so maybe this issue disappears

BaldMansMojo commented 6 years ago

Unfortunately not. I always prefer the packages coming with the OS. Look at pkgs.org https://centos.pkgs.org/6/centos-sclo-rh/ . It's from the software collections. By the way - I haven't forgotten this and other issues. I had not enough time during the last months. But a new version should come out the next time

banholzer commented 6 years ago

I din't remember this issue was still open. The hint not to use the internal Perl interpreter helped that time. Shouldn't an0nwax's problem be a new issue? Should this be closed?

ant0nwax commented 6 years ago

Thanks For the Hint Patrick

with the Internal Perl, could you please state which Version of Perl did work? I tried to Run Martins script on perl 5.28..sth and it was not working, crashing with error we are on CentOS 6.9 and run perl 5.10 (system setting)

Thanks for an answer again

On 14 September 2018 at 16:26, Patrick Banholzer notifications@github.com wrote:

I din't remember this issue was still open. The hint not to use the internal Perl interpreter helped that time. Shouldn't an0nwax's problem be a new issue? Should this be closed?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BaldMansMojo/check_vmware_esx/issues/90#issuecomment-421375615, or mute the thread https://github.com/notifications/unsubscribe-auth/Af7fzPiaJ_-pC8Kl8xtVTaE94-pKsdZ8ks5ua7yxgaJpZM4IO1rb .

-- Szabó Adam Ügyvezető

Was Ist Das Kft. Dezső utca 4/A 1016 Budapest 0630/5547341 (telefonbeszélgetésre, kérem egyeztessünk időpontot emailben)

BaldMansMojo commented 5 years ago

I run it on CentOS 6.9 with SDK 5.5 and CentOS 7.4 with SDK 6.5. All works well. And please forget the internal Perl interpreter. It's real big B U L L S H I T. It's a relict from a time were servers were small and resources very limited.

I will close this issue. If you need it please reopen it. Regards Martin

BaldMansMojo / check_vmware_esx

mem usage = -0.01 % #90

./esxcli --version