BaldMansMojo / check_vmware_esx

chech_vmware_esx Fork of check_vmware_api.pl
GNU General Public License v2.0
124 stars 67 forks source link

High Load #43

Closed djnews24 closed 10 years ago

djnews24 commented 10 years ago

Hi, Since the last git pull i see this errors: Can't kill a non-numeric process ID at /usr/lib/nagios/plugins/check_vmware_esx/check_vmware_esx.pl line 1550.

my load on the icinga2 server ist now at 95 (ubuntu lts 14)

Anything is wrong here? Reboot etc has no effect.

best, Dennis

BaldMansMojo commented 10 years ago

Seems you have some scrap in the session lockfiles. If the plugins starts it places it's PID in the lockfile. The PID is numeric. See:

     # Second get the old PID
     while(<SESSION_LOCK_FILE>)
          {
          $PID_old = $_;
          }
     close (SESSION_LOCK_FILE);    

     # Third - check for the process which wrote the lock file the last time
     $PID_exists = kill 0, $PID_old;

A kill 0 doesn't mean a kill of the process. If the process is still running it not running it removes the lockfile to create a new one. This is to ensure, that there is no lock file without a process. For example if a check is killed we must be sure the the remaining lock file will be removed. So please look into your lockfiles. The only content must be a number. The PID.

martin

djnews24 commented 10 years ago

hi martin,

i already remove everything there.but still got these errors

BaldMansMojo commented 10 years ago

Look into a lockfile. What's in it. I'm using the plugin too (like many others). And no one has reported such an error

djnews24 commented 10 years ago

hi martin sure i use the plugin too since month. but i have too much esx i saw. i tweaked now the intervals to retry on error later and now the load goes down. so for me it can be closed

BaldMansMojo commented 10 years ago

Hi Dennis, load is an issue with monitoring. I have 2 Quad dual core Opteron Servers here. Failover Cluster with one active node and around 1800 host with nearly 18000 services. Worx still well but don't try reporting for more than one day :-)) Martin