canonical / charm-nrpe

A subordinate charm used to configure nrpe (Nagios Remote Plugin Executor)
Apache License 2.0
1 stars 6 forks source link

nrpe check_disk should ignore /snap mountpoints #131

Closed sudeephb closed 8 months ago

sudeephb commented 8 months ago

As we start installing snaps into our environments for things such as prometheus exporters, we're finding that the disk_root configs in the nrpe charm need to be updated to add /snap to the -i ignore list to avoid CRITICAL alerts on root disks even when they are 6% utilized as below:

$ df -h Filesystem Size Used Avail Use% Mounted on udev 28G 0 28G 0% /dev tmpfs 51G 4.1G 47G 9% /run /dev/sda1 2.0T 104G 1.8T 6% / tmpfs 252G 4.0K 252G 1% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 252G 0 252G 0% /sys/fs/cgroup /dev/sde 3.7T 328M 3.7T 1% /srv/node/sde /dev/sdf 3.7T 327M 3.7T 1% /srv/node/sdf /dev/sdg 3.7T 325M 3.7T 1% /srv/node/sdg /dev/bcache2 3.7T 37G 3.7T 1% /srv/ceph/ceph3 /dev/bcache3 3.7T 41G 3.6T 2% /srv/ceph/ceph2 /dev/bcache1 3.7T 41G 3.6T 2% /srv/ceph/ceph1 /dev/bcache0 1.7T 272M 1.7T 1% /srv/nova/instances cgmfs 100K 0 100K 0% /run/cgmanager/fs /dev/loop0 81M 81M 0 100% /snap/core/2381 /dev/loop1 5.5M 5.5M 0 100% /snap/prometheus-ceph-exporter/12 /dev/loop2 81M 81M 0 100% /snap/core/2462 tmpfs 51G 0 51G 0% /run/user/1001

Our typical config value for disk_root: -u GB -w 25% -c 20% -K 5% -A -i '/dev/pts|/run|/sys/fs|udev|/boot/efi|/sys/kernel/debug/tracing'

I'd suggest perhaps that many of these ignores should be part of the check_disk exclusion for %util checks.

The workaround at the moment is adding |/snap to the end of the -i flag on the charm config.

As we expect snaps to become a universal packaging mechanism for many charms as we go forward, we should ensure that our operational tooling understands them and treats them in the proper manner by default.

This may be something to resolve upstream into the monitoring-plugins-basic package itself for the /usr/lib/nagios/plugins/check_disk executable.


Imported from Launchpad using lp2gh.

sudeephb commented 8 months ago

(by hloeung) I had a look into this and can't see it in the charm itself:

| https://git.launchpad.net/nrpe-charm/tree/config.yaml#n48 | https://git.launchpad.net/nrpe-charm/tree/hooks/nrpe_helpers.py#n346

Is this a custom config overriding the default? Or some other charm used?

sudeephb commented 8 months ago

(by afreiberger) This is a custom config that we set on disk_root.

I think this actually should be filed against monitoring-plugins-basic package, as it's the .deb that provides /usr/lib/nagios/plugins/check_disk.

I know that check_disk ignores things like sysfs and similar...it should also ignore loop mounts (whether snap or iso loops, or otherwise, perhaps unless it's a valid read/write fs type)

sudeephb commented 8 months ago

(by janitor) Status changed to 'Confirmed' because the bug affects multiple users.