BaldMansMojo / check_vmware_esx

chech_vmware_esx Fork of check_vmware_api.pl
GNU General Public License v2.0
123 stars 67 forks source link

Plugin not working (stalls)? #93

Closed Napsty closed 8 years ago

Napsty commented 8 years ago

I read through other issues and uninstalled SDK 6.0 U2 and installed the Perl SDK 5.5 (5.5.0-2043780). I'm able to connect successfully with esxcli:

# esxcli -s esx001 -u root -p secret hardware cpu global get   
CPU Packages: 2
CPU Cores: 12
CPU Threads: 24
Hyperthreading Active: true
Hyperthreading Supported: true
Hyperthreading Enabled: true
HV Support: 3
HV Replay Capable: true
HV Replay Disabled Reasons: 

However not with the plugin:

# time ./check_vmware_esx.pl -H esx001 -u root -p secret -S cpu
^CUNKNOWN: Script killed by monitor.

real    1m31.518s
user    0m0.199s
sys 0m0.019s

As you can see, nothing happened and I killed the execution with CTRL-C. Any idea where this stalls?

VMware Perl SDK: VMware-vSphere-Perl-SDK-5.5.0-2043780.x86_64 OS: Ubuntu 14.04 x64 ESX Host: 5.1.0

BaldMansMojo commented 8 years ago

Gruezi ;-),

should be not a big deal. I think you have a problem with the session files. Here we can have two scenarios: a) You have a problem with the rights of the directory storing the session files and the session lock files. Use option --nosession for testing. With this option there is no session file. Don't use it for production because every check will login/logout and therefore your Vmware logs will contain a huge lot of nonsense. b) You have a rotten session (lock) file. This can happen while testing. Delete the files. I've seen that you use root for checking. Bad idea. Better is a local monitoring user on each ESX with the right to see all but to do nothing. I have done it "the classical" way by loggin in to the command line with ssh, addin the user on the linux base (/etc/passwd etc.). After that you have to connect to the host with the vshere client. WebGui is not working here because you have to connect directly the the ESX host. The WebGui uses only the Vcenter and you can't add local users using the Vcenter. With the client I connected to the host and add the rights to the user.

According to our Vmware admins this should also work: https://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.security.doc%2FGUID-670B9B8C-3810-4790-AC83-57142A9FE16F.html See closed issues because we had the problem with the user and the missing rights in the past.

Don't test with root except your nagios/icinga/Shinken/Naemon.... is running as root.

Regards Martin

Napsty commented 8 years ago

Hi Martin

I tried it with the --nosession option and I also deleted the existing _session files in /tmp. But still the same problem:

time ./check_vmware_esx.pl -H esxz001 -u root -p "secret" -S cpu -s info --nosession
^CUNKNOWN: Script killed by monitor.
Use of uninitialized value $sessionlockfile in unlink at ./check_vmware_esx.pl line 2489.

real    1m30.359s
user    0m0.206s
sys 0m0.020s

Nothing happens or at least nothing I can see happens.

Usernamen and password were verified and are working with esxcli.

tcpdump while executing the plugin:

08:10:56.068671 IP icinga.48761 > esx001.443: Flags [S], seq 3858533326, win 29200, options [mss 1460,sackOK,TS val 841360147 ecr 0,nop,wscale 7], length 0
08:10:56.068843 IP esx001.443 > icinga.48761: Flags [S.], seq 926179508, ack 3858533327, win 65535, options [mss 1460,nop,wscale 9,sackOK,TS val 3971863747 ecr 841360147], length 0
08:10:56.068863 IP icinga.48761 > esx001.443: Flags [.], ack 1, win 229, options [nop,nop,TS val 841360147 ecr 3971863747], length 0
08:10:56.068987 IP icinga.48761 > esx001.443: Flags [P.], seq 1:296, ack 1, win 229, options [nop,nop,TS val 841360147 ecr 3971863747], length 295
08:10:56.069601 IP esx001.443 > icinga.48761: Flags [P.], seq 1:1092, ack 296, win 130, options [nop,nop,TS val 3971863748 ecr 841360147], length 1091
08:10:56.069628 IP icinga.48761 > esx001.443: Flags [.], ack 1092, win 251, options [nop,nop,TS val 841360147 ecr 3971863748], length 0
08:10:56.069995 IP icinga.48761 > esx001.443: Flags [P.], seq 296:622, ack 1092, win 251, options [nop,nop,TS val 841360147 ecr 3971863748], length 326
08:10:56.079414 IP esx001.443 > icinga.48761: Flags [P.], seq 1092:1151, ack 622, win 130, options [nop,nop,TS val 3971863749 ecr 841360147], length 59
08:10:56.101466 IP icinga.48761 > esx001.443: Flags [P.], seq 622:787, ack 1151, win 251, options [nop,nop,TS val 841360155 ecr 3971863749], length 165
08:10:56.102900 IP esx001.443 > icinga.48761: Flags [P.], seq 1151:1929, ack 787, win 130, options [nop,nop,TS val 3971863751 ecr 841360155], length 778
08:10:56.103782 IP icinga.48761 > esx001.443: Flags [F.], seq 787, ack 1929, win 268, options [nop,nop,TS val 841360155 ecr 3971863751], length 0
08:10:56.103962 IP esx001.443 > icinga.48761: Flags [.], ack 788, win 130, options [nop,nop,TS val 3971863751 ecr 841360155], length 0
08:10:56.104243 IP esx001.443 > icinga.48761: Flags [F.], seq 1929, ack 788, win 130, options [nop,nop,TS val 3971863751 ecr 841360155], length 0
08:10:56.104266 IP icinga.48761 > esx001.443: Flags [.], ack 1930, win 268, options [nop,nop,TS val 841360155 ecr 3971863751], length 0
08:10:56.149540 IP icinga.48762 > esx001.443: Flags [S], seq 26454409, win 29200, options [mss 1460,sackOK,TS val 841360167 ecr 0,nop,wscale 7], length 0
08:10:56.149693 IP esx001.443 > icinga.48762: Flags [S.], seq 1141261788, ack 26454410, win 65535, options [mss 1460,nop,wscale 9,sackOK,TS val 3971863756 ecr 841360167], length 0
08:10:56.149712 IP icinga.48762 > esx001.443: Flags [.], ack 1, win 229, options [nop,nop,TS val 841360167 ecr 3971863756], length 0
08:10:56.149805 IP icinga.48762 > esx001.443: Flags [P.], seq 1:296, ack 1, win 229, options [nop,nop,TS val 841360167 ecr 3971863756], length 295
08:10:56.150361 IP esx001.443 > icinga.48762: Flags [P.], seq 1:1092, ack 296, win 130, options [nop,nop,TS val 3971863756 ecr 841360167], length 1091
08:10:56.150381 IP icinga.48762 > esx001.443: Flags [.], ack 1092, win 251, options [nop,nop,TS val 841360167 ecr 3971863756], length 0
08:10:56.150677 IP icinga.48762 > esx001.443: Flags [P.], seq 296:622, ack 1092, win 251, options [nop,nop,TS val 841360167 ecr 3971863756], length 326
08:10:56.159859 IP esx001.443 > icinga.48762: Flags [P.], seq 1092:1151, ack 622, win 130, options [nop,nop,TS val 3971863757 ecr 841360167], length 59
08:10:56.160195 IP icinga.48762 > esx001.443: Flags [P.], seq 622:1299, ack 1151, win 251, options [nop,nop,TS val 841360169 ecr 3971863757], length 677
08:10:56.162274 IP esx001.443 > icinga.48762: Flags [.], seq 1151:2599, ack 1299, win 130, options [nop,nop,TS val 3971863757 ecr 841360169], length 1448
08:10:56.162287 IP esx001.443 > icinga.48762: Flags [.], seq 2599:3750, ack 1299, win 130, options [nop,nop,TS val 3971863757 ecr 841360169], length 1151
08:10:56.162293 IP icinga.48762 > esx001.443: Flags [.], ack 3750, win 296, options [nop,nop,TS val 841360170 ecr 3971863757], length 0
08:10:56.162440 IP esx001.443 > icinga.48762: Flags [P.], seq 3750:3993, ack 1299, win 130, options [nop,nop,TS val 3971863757 ecr 841360170], length 243
08:10:56.200545 IP icinga.48762 > esx001.443: Flags [.], ack 3993, win 319, options [nop,nop,TS val 841360180 ecr 3971863757], length 0
Napsty commented 8 years ago

Wow, it actually seems to work. But it's so slow, that it's not usable:

$ time /tmp/check_vmware_esx.pl -H esx001 -u root -p secret --select=soap --trace=4
OK: Successfully connected to the VMWare SOAP API.

real    0m47.565s
user    0m0.173s
sys     0m0.029s

And I also get a SOAP request error:

$ time /tmp/check_vmware_esx.pl -H esx001 -u root -p secret --select=runtime
SOAP request error - possibly a protocol issue: <?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"  xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
[...]
Power Supply 2 Status 0: Failure status - Dea

real    3m46.739s
user    0m0.199s
sys     0m0.022s

I suspect a problem in the Perl SDK though. Shouldn't trace show some more information, too?

Napsty commented 8 years ago

This seems to be the source of the problem: https://communities.vmware.com/message/2298661 https://bclary.com/blog/2014/04/17/how-to-work-around-soap-request-protocol-error-with-vmware-vsphere-perl-sdk-5-5-0/

The SDK doesn't like liblwp and libwww to be "newer" versions.

Napsty commented 8 years ago

Hi Martin,

I solved it and documented it here: http://www.claudiokuenzler.com/blog/650/slow-vmware-perl-sdk-soap-request-error-libwww-version

Maybe you can refer to this solution somewhere in your README as the problem comes from within the SDK.

cheers, Claudio