BaldMansMojo / check_vmware_esx

chech_vmware_esx Fork of check_vmware_api.pl
GNU General Public License v2.0
123 stars 67 forks source link

Check_vmware_esx.pl with VMWare SDK 6.x doesn't work with vcenter 5.1 and 5.5 #91

Closed tjyang closed 8 years ago

tjyang commented 8 years ago

Hi there I like to log this issue. This is a CentOS 7.2 instance with VMware-vSphere-Perl-SDK-6.0.0-3561779.x86_64.tar.gz. check_vmware_esx.pl can talk with vcenter 6.x and ESXi all version. But it just can't talk with vcenter 5.1/5.5 from Nagios query, it error out with (Return code of 255 is out of bounds) .

Interesting part is the interacting with vc5.1/5.5 work fine from direct command line query.

 /usr/lib64/nagios/plugins/contrib/check_vmware_esx  -D vc01.test.com  -u account  -p password   --gigabyte -w 10 -c 5 --select=volumes

It is just the Nagios layer calling this script is having "(Return code of 255 is out of bounds) " issue. running exact command as nagios user (or any) has no error code.

BaldMansMojo commented 8 years ago

SDK 6 is a pretty little thing of bull..t. You don't need it because it's not interacting with version 5.x correctly. See some older (and partly closed issues). SDK 5.x is enough. I'm running CentOS 6.x (Cluster with Nagios) and there is no chance to get SDK 6.x up and running. But as mentioned it is not needed. All of my ESX 6.x are monitored with SDK 5.x. Runs perfectly.

SDK 6.x will be needed for monitoring new features of ESX 6.x (or Vcenter). As long as the API won't change and my plugin doen't monitor something new SDK 5.x is enough (and stable).

tjyang commented 8 years ago

I reversed SDK 6 backup SDK 5.5 and check_vmware_esx started to work with vc 5.1/5.5 again. Good to know it is SDK6's fault and should stay away for now. But for long run, I think we should point out exactly how SDK6 interact with vc 5.1/5.5 incorrectly. so for those who has vmware sdk support contract may be able to request the bug fix.

BaldMansMojo commented 8 years ago

I'm using the SDK. I'm not debugging it.There seems to be an issue with the generated URL. It differs from 5 to 6. But in my environment I have unfortunately no time to play around with CentOS 7. We use RHEL and CentOS 7 here but for monitoring I will stay on 6.x. So if you are running your monitor on a cluster you have to ensure some system service dependencies, ensure that nothing will harm your environment when you type /etc/init.d/nagios restart|reload on the wrong node. This can easily done within the init scripts but is opposite to the damned systemd philosophy and can hardly be done with systemd configuration. systemd is a nightmare in a complex environment but unfortunately you have no choice.

tjyang commented 8 years ago

I am using Nagios 3.5.1 on CentOS 6.x also but in the process of moving to CentOS 7.2 + Nagios 4.1.1 using pacemaker+drbd to form a two nodes and active/passive cluster. the services(mysql,nagios,httpd,shared drbd drive,shared FS) are all controlled by pacemaker. Also drbd drive sync across two datacenters over wan network. Agree with your point on using vs debugging. so far we don't have to use SDK6. the SDK6 install instruction on vmware website was incorrect also. I sent them(vmware esx support, not sdk support) the correct install procedure for sdk 6.0 but unfortunately looks like the issue is on SDK6 side. Thanks for the sharing. now you pls close this issue if you like.

BaldMansMojo commented 8 years ago

Don't use Nagios 4.x. Background: Andreas Erickson from op5 was the main developer of Nagios 4 and wrote about 95% of the code. Nagios 4 was ready for a long time but wasn't released because Nagios XI was based on Nagios 3.x. Short time after release of 4.x and immediately after beginning to develop 5.x Nagios Enterprises fired Andreas because they said that none can be head developer who isn't an employee of Nagios Enterprises.

Immediatly after that I visited the Open Source Monitoring Conference in Nuremberg (Germany) as every year. Instead of presenting Nagios 5.x and it's roadmap Andreas informed about the stuff above and prsented Naemon.

So Nagios 4.x is kind of a dead cow. The developers from 4.x are mainly working on Naemon today and every important new technical issue is with Naemon and not with Nagios because there are no developers (or nearly no) at Nagios Enterprises able to maintain the code and to develop a version 5

tjyang commented 8 years ago

I really appreciate the background info on Nagios 4.x. But I need to up the version to escape a 3.5.1 daemon crashing issue. I heard of Naemon but I didn't know Andreas was the main Nagios 4 developer. Thanks for the advice and history.

BaldMansMojo commented 8 years ago

Tell me about the crash. I use 3.5.1 too but I know there was a little bug. Therefore I rebuild the package (rpmbuild -ba etc.). There is a bug with MK Livestatus from Matthias Kettner. (I know him well. We both live in Munich/Bavaria/Germany). This bug crashes Nagios. The is a patch available from Matthias and I integrated it.

Since that time I had absolutely no problems. I use it on a CentOS 6 cluster (drbd etc.) wit mod_gearman an have currently over 1900 hosts with nearly 20.000 services. Around 18.000 of this are active checks. I can send you the source RPM and the binaries by email.

I've seen you're from Chicago. Man - I love Chicago Blues style.

tjyang commented 8 years ago

http://tracker.nagios.org/view.php?id=455 my workaround is to restart it by a cronjob running every minute. The pain is that I can't have much history log to go back for reference.

I am interested to see the patch for above bug.( tjyang2001 AT GMAIL dot COM)

Did you use just use shard disk(using drbd) or you have heartbeat/pacemaker also ? if you are using pacemaker suite, there is nagios OCF resource agent. I am using it on centos 7+Nagios 4.1.1. it should work with CentOS 6+ Nagios 3.5.1 also.

I was not aware of "Chicago Blues", thanks for bringing this term up.