dpiquet / pve-monitor

pve-monitor is a tool to monitor hypervisor and virtual machines in a proxmox cluster
37 stars 24 forks source link

pve-monitoring.cfg prevents nagios from startup #18

Closed doque closed 9 years ago

doque commented 9 years ago

I've set up a config file almost exactly as the given sample:

#define nodes
    node farm4 {
        address 192.168.178.6
        port 8006 # Optional, default is 8006
        monitor_account root
        monitor_password pass
    #    realm pve # Optional, default is pam
        mem 80 90 # optional, not monitored if not defined
        cpu 80 95 # optional
        disk 80 90 # optional
    }

    node farm5 {
        address 192.168.178.7
        port 8006 # Optional, default is 8006
        monitor_account root
        monitor_password pass
    #    realm pve # Optional, default is pam
        mem 80 90 # optional, not monitored if not defined
        cpu 80 95 # optional
        disk 80 90 # optional
    }

    node farm5 {
        address 192.168.178.8
        port 8006 # Optional, default is 8006
        monitor_account root
        monitor_password pass
    #    realm pve # Optional, default is pam
        mem 80 90 # optional, not monitored if not defined
        cpu 80 95 # optional
        disk 80 90 # optional
    }

It works fine when running pve-monitor from the command line:

root@proxmox-monitoring:/usr/lib/nagios/plugins# ./pve-monitor.pl --conf /usr/local/nagios/etc/configs/pve-monitor.cfg --nodes
NODES OK  3 / 3 working nodes
NODE farm4 OK : cpu OK (0.88%), mem OK (40.47%), disk OK (2.47%) uptime 145405
NODE farm5 OK : cpu OK (0.02%), mem OK (2.85%), disk OK (1.05%) uptime 78381
NODE farm5 OK : cpu OK (0.02%), mem OK (2.85%), disk OK (1.05%) uptime 78381

However, running /etc/init.d/nagios start fails with this error:

Error: Unexpected token or statement in file '/usr/local/nagios/etc/configs/pve-monitor.cfg' on line 2.

Thanks for all help.

dpiquet commented 9 years ago

Pve-monitor's config file must not be loaded by Nagios itself but only by the plugin.

I don't know your Nagios configuration but i guess all files in the 'config' directory are parsed by Nagios. Try to move pve-monitor's config file somewhere else (Maybe /usr/local/nagios/etc/ ?)

doque commented 9 years ago

That solves the issue, but how do I proceed from there? In the Nagios web gui, I can't see anything related to the nodes I set up, or PVE in general.

dpiquet commented 9 years ago

Did you declare the host in Nagios config ? This is covered by this part of the doc: https://github.com/dpiquet/pve-monitor/wiki#define-your-cluster-in-nagios-configuration You may have to adapt to your own config as you're on a custom install of Nagios.

Also take care about the files owners and permissions. Insufficient filesystem permissions is the most common problem i had to solve so far, although is does not seems to be the case for you.

PS: I noticed you have twice the same name (farm5) in your configuration file, I guess this is just a typo but don't forget to correct it as this name is used by pve-monitor to identify the nodes.

doque commented 9 years ago

I'm unsure what the equivalent of /etc/nagios3/conf.d is in my setup. Which files would exist in this directory? That might help me locate the correct directory.

For now, I've changed cfg_dir in /usr/local/nagios/etc/nagios.cfg to /usr/local/nagios/etc/configs and I've placed the pve-cluster.cfg file there.

Running nagios now yields this output:

root@proxmox-monitoring:/usr/local/nagios/etc# /etc/init.d/nagios restart
Running configuration check...

Nagios Core 4.0.8
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2014
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
Error: Service check command 'check_pve_cluster_openvz' specified in service 'OpenVZ VMs' for host 'pve-cluster' not defined anywhere!
Error: Service check command 'check_pve_cluster_qemu' specified in service 'Qemu VMs' for host 'pve-cluster' not defined anywhere!
Error: Service check command 'check_pve_cluster_storage' specified in service 'Storages' for host 'pve-cluster' not defined anywhere!
    Checked 11 services.
Error: Host check command 'check_pve_cluster_nodes' specified for host 'pve-cluster' is not defined anywhere!
Warning: Host 'pve-cluster' has no default contacts or contactgroups defined!
    Checked 2 hosts.
    Checked 1 host groups.
    Checked 0 service groups.
    Checked 1 contacts.
    Checked 1 contact groups.
    Checked 24 commands.
    Checked 5 time periods.
    Checked 0 host escalations.
    Checked 0 service escalations.
Checking for circular paths...
    Checked 2 hosts
    Checked 0 service dependencies
    Checked 0 host dependencies
    Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 1
Total Errors:   4

What's curious, it's checking 2 hosts - I have not set up any other besides PVE, though. I'm guessing the other one is localhost, which is displayed in the web GUI. So what's the missing piece here...?

/edit: I've also put pve-monitor.cfg and pve-cluster.cfg in the /etc/nagios-plugins/configdirectory.

dpiquet commented 9 years ago

This means the commands definition are not loaded.

This part of the doc covers it: https://github.com/dpiquet/pve-monitor/wiki#first-define-the-command I guess you can simply define those commands in a cfg file inside the /usr/local/nagios/etc/configs directory.

The most important thing is Nagios loads it. The directory where we drop the files are more important for administrators so they can easily find what they want in the configuration, especially when it becomes more complex.

doque commented 9 years ago

It seems that nagios is reading the pve-monitor.cfg file from the usr/local/nagios/etc/configs directory, but somehow not registering the commands. If the file is in the directory, starting nagios yields the error described above. If I move the file out of the directory, nagios starts just fine (but I don't see anything related to PVE in the web gui).

I'm running all commands as root, that would get rid of filesystem permissions, right?

dpiquet commented 9 years ago

Do your pve-monitor.cfg file contains 'commands' definition like this one ?

define command{ command_name check_pve_cluster_nodes command_line /usr/bin/perl /usr/lib/nagios/plugins/pve-monitor.pl --conf /etc/nagios3/pve-monitor.conf --nodes }

The error given by your Nagios means that these directives are not loaded. You might have forgot them, or they are in a file which is not loaded.

Runing /etc/init.d/nagios or service nagios as root does not means nagios will run as root too. Services often change their running user at startup for security reasons. However, i don't think this is the problem you're facing right now.

doque commented 9 years ago

I got it to work - the sample configuration file for the nodes had the same name (pve-monitor.cfg) as the one defining the commands. I've fiddled a bit with nagios.cfg and manually specified the to-be-included cfg files, now it's running.

Thank you very much for your patience!