ceph / ceph-nagios-plugins

Nagios plugins for Ceph
Apache License 2.0
81 stars 80 forks source link

check_ceph_osd nor working with Proxmox/Ceph Nautilus #61

Open 7thsonch opened 4 years ago

7thsonch commented 4 years ago

I've just upgraded our Proxmox cluster to 6.2 including the upgrade from Ceph Luminous to Nautilus. All other checks (check_ceph_health, check_ceph_mon, check_ceph_df) still work as expected but check_ceph_osd is refusing to work.

I'm using the following command: /usr/lib/nagios/plugins/check_ceph_osd -i nagios --key /var/lib/nagios/ceph.client.nagios.keyring --host 10.55.0.1 --out

followed by this error: OSD ERROR: 2020-08-26 09:32:03.862 7fbe53968700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.nagios.keyring: (2) No such file or directory 2020-08-26 09:32:03.862 7fbe53968700 -1 AuthRegistry(0x7fbe4c081ff8) no keyring found at /etc/pve/priv/ceph.client.nagios.keyring, disabling cephx

I don't know why ceph is looking for a key in /etc/pve/priv/ceph.client.nagios.keyring If I copy my key from /var/lib/nagios/ceph.client.nagios.keyring to /etc/pve/priv/ceph.client.nagios.keyring the command works as expected but only as user root. In Proxmox, /etc/pve/priv is a special cluster file system where all files are owned by root with no read permissions for any other user. Of course I would like to avoid running the check as root.

Keyring has been created with ceph auth get-or-create client.nagios mon 'allow r' osd 'allow r' > /var/lib/nagios/ceph.client.nagios.keyring

Maybe thats the same problem as in issue #30 but worked in Luminous?

valerytschopp commented 4 years ago

Did you try with the parameter -k KEYRING_FILE or --keyring KEYRING_FILE ?

7thsonch commented 4 years ago

Yes I did.

As user "nagios":

nagios@ host:~$ /usr/lib/nagios/plugins/check_ceph_osd -i nagios -k /var/lib/nagios/ceph.client.nagios.keyring --host 10.55.0.3 --out
OSD ERROR: 2020-08-26 10:18:32.711 7f2bfead3700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.nagios.keyring: (13) Permission denied

As root:

root@host:~# /usr/lib/nagios/plugins/check_ceph_osd -i nagios -k /var/lib/nagios/ceph.client.nagios.keyring --host 10.55.0.1 --out
OSD OK
Up OSDs: osd.0 osd.1
Down+In OSDs:
Down+Out OSDs:
| 'osd_up'=2 'osd_down_in'=0;;2 'osd_down_out'=0;;2
valerytschopp commented 4 years ago

It is just a permission issue of the keyring file, it should be readable by the user nagios

unable to find a keyring on /etc/pve/priv/ceph.client.nagios.keyring: (13) Permission denied

to test, just do as user nagios: cat /etc/pve/priv/ceph.client.nagios.keyring

7thsonch commented 4 years ago

As I said in my initial comment /etc/pve/priv/ is a special cluster file system where all files are owned by root with no read permissions for any other user. So there is no chance that user nagios can read /etc/pve/priv/ceph.client.nagios.keyring Thats why I placed the keyring at /var/lib/nagios/ceph.client.nagios.keyring but even if I set the keyring to this location (with the --keyring or -k parameter) it still tries to use /etc/pve/priv/ceph.client.nagios.keyring

valerytschopp commented 4 years ago

Can you try, as user nagios:

/usr/bin/ceph --id nagios --keyring /var/lib/nagios/ceph.client.nagios.keyring osd status

it is what the nagios plugin is doing...

7thsonch commented 4 years ago

Similar error:

nagios@host:~$ /usr/bin/ceph --id nagios --keyring /var/lib/nagios/ceph.client.nagios.keyring osd status
2020-08-26 14:48:36.801 7f0c73727700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.nagios.keyring: (13) Permission denied
Error EACCES: access denied: does your client key have mgr caps? See http://docs.ceph.com/docs/master/mgr/administrator/#client-authentication

So ceph still tries to use /etc/pve/priv/ceph.client.nagios.keyring :-(

Well I think the easiest and maybe only solution will be to run the command with sudo

valerytschopp commented 4 years ago

Is the user nagios able to read the ceph.conf ?

A possible workaround is to define the keyring in a nagios ceph.conf, and use it with the client.

nagios.ceph.conf (a modified copy of your normal ceph.conf):

[global]
keyring = /var/lib/nagios/ceph.client.nagios.keyring
fsid = ...

and then test with:

/usr/bin/ceph -c /var/lib/nagios/nagios.ceph.conf --id nagios osd status
valerytschopp commented 4 years ago

If it works, you can then:

/usr/lib/nagios/plugins/check_ceph_osd -c /var/lib/nagios/nagios.ceph.conf -i nagios --host 10.55.0.1 --out

If it doesn't work, I'm out of idea, and you should use sudo

rfpronk commented 3 years ago

I ran into the same after upgrading the checks from 1.5.5 to 1.5.6.

The keyring file defined in the given ceph.conf now takes precedence of the the keyring defined with the -k parameter. In my case the one defined in ceph.conf didn't exists for the nagios user (/etc/pve/priv/$cluster.$name.keyring) and it errored on that without trying the file (which did exist with proper permissions) specified by -k.

I changed my setup to simply put the correct path to the nagios keyring into ceph.conf used by the checks and removed the -k flag. I guess it's cleaner that way anyway.

Cylindric commented 3 years ago

@rfpronk Hi there, I'm having exactly this problem right now with Prox+Ceph. Can you share what exactly you put into your /etc/pve/ceph.conf to make this work? Do you still need the nagios keyring file at all? I can't seem to work out what to put into ceph.conf to get it working. Thanks!

stopcap commented 3 years ago

What I did is to create a copy of ceph.conf in /etc/ceph and call it ceph_icinga.conf for example. In this file you have to change the path of the keyring-file to a different directory like /etc/icinga2/ for example:

ceph_icinga.conf:

...

[client] keyring = /etc/icinga2/ceph.client.nagios.keyring

...

Owner of this file has to be nagios. And it has to have read permissions for nagios user.

You have to do these steps on each Node of your proxmox-cluster where ceph is running obviously.

On your icinga server I did the following:

In /etc/icinga2/zones.d/globale-template/commands.conf I created an object for each OSD I wanna monitor. You have to change -k and -c parameter accordingly:

object CheckCommand "pve_ceph_osd5" { import "plugin-check-command" command = [ "/usr/lib/nagios/plugins/check_ceph_osd" ] arguments = { "-i" = "nagios" "-e" = "/usr/bin/ceph" "-k" = "/etc/icinga2/ceph.client.nagios.keyring" "-c" = "/etc/ceph/ceph_icinga.conf" "-H" = "..." "-I" = "5" } }