NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
176 stars 95 forks source link

Feature request: Support ZFS filesystem checks natively #1076

Closed pittagurneyi closed 6 months ago

pittagurneyi commented 6 months ago

On nearly all my systems ZFS is used as main filesystem.

Currently only that which is accessible via psutil, i.e. mount points, is available via NCPA.

What I would like to see is native support for:

ZFS Pool health state (extracted from zpool status)

ZFS Pool available space (extracted from zfs list -o space)

Sometimes, for example on backup systems, the ZFS datasets actually containing the data aren't mounted - on purpose due to how zfs send/recv works -, meaning there is no way to get the free space via psutil. It just shows 0% for the pool mount point, although it might be 80% filled.

ericloyd commented 6 months ago

There is a ZFS plugin at https://exchange.nagios.org/directory/Plugins/Uncategorized/Operating-Systems/Solaris/check_zfs/details that you might be interested in. The use cases for ZFS monitoring being built in to NCPA are limited, so I imagine it will be faster to deploy that plugin than to wait for NCPA to be updated to include that functionality.

pittagurneyi commented 6 months ago

@ericloyd Thank you for the quick reply.

I'm aware of that script and others I've found online. Sadly most of them are not supported anymore and some make high I/O calls like zfs list without any restrictions. On systems with thousands of datasets, calling this every few minutes is a problem. It could be fixed by running zfs list -Hp -o name -s name instead, as that only lists the dataset names and uses sorting by name, resulting in a low I/O call.

EDIT: When I mentioned zfs list -o space I of course meant something parseable and optimized like zfs list -Hp -o space -s name.

In general I'd like to see something officially supported that I can trust to be well-maintained. I believe that ZFS has become a filesystem that is on many systems and a more automated integration into NCPA would be great.

EDIT: Also, it would be great if NCPA could cache the current output of zfs list and zpool status so that not each call to NCPA from Nagios has to run the zfs list, etc. command again. The output of zfs list doesn't change every few seconds, at least not in a way that typically matters. But zpool status data in the cache should probably never be older than 1 minute or so.

ericloyd commented 6 months ago

While I agree with you, the overall penetration rate of ZFS is quite small. So including it as part of the stock NCPA install means adding it to a large portion of installs where it won't actually be used. That aside, the problem is more that the way NCPA works is that it does every check, every time, using tools available to Python. If what you want is the output of zfs list -o space, then why not just write a quick plugin yourself that does that:

One-liner:

#!/bin/bash
zfs list -o space "$1" tail -1 | awk '{printf ("%s: Used: %s, Free: %s\n", $1, $2, $3)}'

Usage:

<scriptpath> /mnt/tank/filesystem

MrPippin66 commented 6 months ago

I would echo Eric's statement. However, even so, for NCPA to support that tech, it's really dependent on the upstream modules NCPA uses to obtain that information. AKA, if "psutil" adopted it, having it reported in NCPA would be a much simpler solution.

So...I would recommend putting that request into the "psutil" maintainers project, but as Eric has already stated, the overall adoption of ZFS isn't widely used as far as the overall landscape NCPA and "psutil" covers.

pittagurneyi commented 6 months ago

I understand. Considering the opinion expressed here I wrote everything I needed yesterday and I'm already using it. I'll close this here.

ericloyd commented 6 months ago

Great idea - go bug the Python people to add support! :-)