daviswr / ZenPacks.daviswr.ZFS

ZFS monitoring for Zenoss
MIT License
1 stars 1 forks source link

Failure to model a single pool in a system containing 3 #10

Open sempervictus opened 3 years ago

sempervictus commented 3 years ago

OpenZFS 2.1 host with 3 zpools hangs on modeling eternally - had to disable the ZFS plugin to get the rest of it commit into the DB. With the plugin enabled, even after zpools are discovered, the modeler hangs indefinitely. There is an error message in Zenoss:

stderr | interval cannot be zero usage:     status [-c [script1,script2,...]] [-igLpPstvxD]  [-T d\|u] [pool] ...       [interval [count]]
-- | --

The pool which is failing to model is a raidz2 of 6 drives. There's another raidz2 in there with more disks, and both of them have faults showing. The one which works has one UNAVAIL and one FAULTED - both show up as events in Zenoss. The failing pool has a single FAULTED disk in it.

sempervictus commented 3 years ago

After a bunch of meddling, deletion, re-creation, etc, the pool is still not showing up correctly... but better. I can see datasets in the pool now, but the pool shows up with a pool_ prefix and without devices or state showing up. The other pool has since been rebuilt and shows up correctly again, but the primary data pool (backups really) does not. The pool not showing up has a decent number of datasets (735) and takes a couple of seconds to zfs list when cold (most of the time) which may contribute to this, but my hunch is that this is output data formatting. I really really wish Brian had taken up the JSON output pull request more seriously (i wish they took lots of work more seriously - LLNL just lets things rot all the time till contributors quit in furious frustration) - some French high school put a lot of effort into that.