NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
176 stars 95 forks source link

Added preliminary API code for SMART disk monitoring. #988

Closed ne-jmichaelson closed 4 months ago

ne-jmichaelson commented 9 months ago

One caveat at the moment is that the smartctl executable has to be executable by the nagios user that NCPA is running as. This is most easily done by adding the nagios user to a group, giving that group execute permissions on /usr/sbin/smartctl, and setting the setuid bit on the executable

HunnyPuns commented 9 months ago

A couple of things so far:

  1. Not available in Windows. I think this is fine, though we should figure out support for Windows. Until then, the smartmon API endpoint shouldn't show up on Windows installations. There's an example of the reverse of this, where an endpoint is available in Windows, but not Linux. The logs endpoint only shows up in Windows. Could use that as an example.
  2. /usr/sbin/smartctl is hardcoded, but that path is only true for Debian and Debian based distros. CentOS has it at /sbin/smartctl

I was having doubts about requiring an external application, as NCPA ships with "everything you need" to monitor whatever NCPA gives you. This would break from that. On top of it, permissions need to be mucked with so that the nagios user can run smartctl. It's not so much that it needs permission to run smartctl, but it needs the setuid (which you did call out) so it can access the disks. But, at least on CentOS, the default permissions for smartctl give access to everyone to read and execute. Without changing that, we'd effectively be giving anyone on the system permission to become root for running that one application, which seems scary to me.

I don't want to dump on this idea. So a possible solution to all 3 issues here might be to use pySMART (insert "shop S-Mart" joke here). The project looks to be backed by the TrueNAS team, so likely it will continue to receive updates. Assuming it is not just going out and executing smartctl wherever it is found, it might give us an OS agnostic way to get SMART metrics. https://pypi.org/project/pySMART/

HunnyPuns commented 8 months ago

Tests on a physical host look good. A test VM didn't show any SMART data, but that's not too surprising. We had talked IRL regarding wanting to add a few more features to this:

Did you want to add those as part of this PR, or add those features in a following PR?

pittagurneyi commented 7 months ago

https://github.com/NagiosEnterprises/ncpa/pull/988/commits/8536b3e007a49cd88f9e2ac9f3ecc6d95b7b383c#diff-26ab99d8f7c7317c7ab1866f5adfeae76389ddf37d3b14eb98a76dca8300cc69R41

    output = subprocess.check_output("ls -l  /dev/disk/by-id/ |  grep -v wwn | grep -v '\-part' | tr -s ' '  | sed 's/\.\.\///g'", shell=True, universal_newlines=True).split('\n')

This is a bit problematic, don't you think?

You'll have to find a better way to filter for physical disks, otherwise you'll catch device-mapper devices, like LUKS decrypted devices, in the list.

Instead of parsing /sys yourself, maybe look into a python library that does it for you?

I just found https://github.com/truenas/py-SMART/blob/master/pySMART/smartctl.py#L191 which uses:

$ smartctl --scan-open
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/nvme0 -d nvme # /dev/nvme0, NVMe device
/dev/nvme1 -d nvme # /dev/nvme1, NVMe device

Maybe implement that as backend instead?

However, what I didn't like was that they use --scan-open, when --scan exists, as the former opens each device, thereby probably bringing it out of a sleep state - I haven't tested that.

From the man page of smartctl:

--scan Scans for devices and prints each device name, device type and protocol ([ATA] or [SCSI]) info. May be used in conjunction with '-d TYPE' to restrict the scan to a specific TYPE. See also info about platform specific device scan and the DEVICESCAN directive on smartd(8) man page.

--scan-open
Same as --scan, but also tries to open each device before printing device info. The device open may change the device type due to autodetection (see also '-d test').

This option can be used to create a draft smartd.conf file. All options after '--' are appended to each output line. For example: smartctl --scan-open -- -a -W 4,45,50 -m admin@work > smartd.conf

Multiple '-d TYPE' options may be specified with '--scan[-open]' to combine the scan results of more than one TYPE.