NagiosEnterprises / nrpe

NRPE Agent
GNU General Public License v2.0
259 stars 133 forks source link

check_disk gives /tmp and /var/tmp as results, even when they do not exist #211

Closed Endemoniada closed 4 years ago

Endemoniada commented 5 years ago

A check can be defined like this:

command[check_disk_all_local]=/usr/lib/nagios/plugins/check_disk -w 10% -W 20% -c 5% -K 10% -A -I '.*docker.*' -I '/snap/.*' --exclude-type=squashfs --exclude-type=tracefs --exclude-type=overlayfs --exclude-type=tmpfs --exclude-type=nsfs --exclude-type=devtmpfs --exclude-type=none

Running the command locally gives the following:

/usr/lib/nagios/plugins/check_disk -w 10% -W 20% -c 5% -K 10% -A -I '.*docker.*' -I '/snap/.*' --exclude-type=squashfs --exclude-type=tracefs --exclude-type=overlayfs --exclude-type=tmpfs --exclude-type=nsfs --exclude-type=devtmpfs --exclude-type=none
DISK OK - free space: / 36102 MB (93% inode=96%); /boot 319 MB (71% inode=99%);| /=2515MB;36637;38672;0;40708 /boot=127MB;423;447;0;471

Note that only / and /boot are showing, as they are actual mounted devices.

However, when running it through NRPE it gives the following instead:

/usr/lib/nagios/plugins/check_nrpe -H localhost -c check_disk_all_local -4
DISK OK - free space: / 36102 MB (93% inode=96%); /boot 319 MB (71% inode=99%); /tmp 36102 MB (93% inode=96%); /var/tmp 36102 MB (93% inode=96%);| /=2515MB;36637;38672;0;40708 /boot=127MB;423;447;0;471 /tmp=2515MB;36637;38672;0;40708 /var/tmp=2515MB;36637;38672;0;40708

Note that it has added /tmp and /var/tmp to the output, with the same data as /. These devices are not present on the system. It does the same thing on many of our systems, regardless of distribution. It does not matter if we query nrpe locally or from our nagios server.

This is especially problematic because we use the output to automatically create individual disk service checks in nagios. That means we create lots of disk checks for /tmp and /var/tmp on servers that don't have those devices.

The issue seems to be confirmed by multiple people, as documented here: https://support.nagios.com/forum/viewtopic.php?f=7&t=47095 https://support.nagios.com/forum/viewtopic.php?f=7&t=40247

jobrik commented 4 years ago

Hi, this is not bug. I run into same issue. And here is my resolution:

TL;DR

Add to /etc/systemd/system/nagios-nrpe-server.service.d/local.conf (or you can directly edit service file but it may be overwritten) the following content:

[Service] PrivateTmp=false

Restart nrpe service and everything should be fine again. I mean working as expected, but... whats expected ?! :)

TL;DR

We use check_disk and other checks for long time with fixed disk strings or path specification. Last week I started switching lil bit to modern dynamic way (no disk string at all, rather filters)... and WHoAla, this problem suddenly appears. /var and /var/tmp were populated and shown in the check_disk results. They were shown all time neverless the method I tried to use or debug (trace, gdb) the check and nrpe except one situation. Bare exec without systemd (etc)... So futher investigation showed that there is PrivateTmp=true flag in systemd nrpe service script, probably from package maintainers. And it is there for good reasons. Check's temp fs isolation. Its up to you if you will leave it as it is and live with that or use disk filters or switch it off...

From the perspective of NRPE: There is no bug, everyhing working perfectly, its not NRPE related/within From the perspective of check_disk: there is also everthing okey, check_disk in these situaions sees next mount points and so it reports them From the perspective of PKG maintainer: there imho should be (n mb is :) good info with all implications noted

I've tested filters and service modification. So far I'm choosing the second way.

Distros affected: Debian 8: not affected -> only sysV script Debian 9: affected -> systemd service script

Snippet:

Takes a boolean argument. If true, sets up a new file system namespace for the executed processes and mounts private /tmp and /var/tmp directories inside it that is not shared by processes outside of the namespace. This is useful to secure access to temporary files of the process, but makes sharing between processes via /tmp or /var/tmp impossible

Refs:

https://www.freedesktop.org/software/systemd/man/systemd.exec.html https://stackoverflow.com/questions/51963195/nrpe-python-script-output-bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=887498

sawolf commented 4 years ago

Nice catch, @jobrik. I'm going to leave this issue open since we keep those service files in version control. It looks like PrivateTmp is explicitly set to True, but there's no rationale for why that happens.

Edit: Now that I've had more time to become familiar with the project, I agree that this doesn't require any code changes.