blind-oracle / zabbix-zfs

Zabbix template & script to monitor ZFS on Linux
Mozilla Public License 2.0
30 stars 7 forks source link

Script fails if spare disk is in use #2

Closed lonnie44 closed 3 years ago

lonnie44 commented 3 years ago

Hi guys,

My zpool looks as follows: (replaced IDs)

NAME                           STATE     READ WRITE CKSUM
tank                           DEGRADED     0     0     0
raidz2-0                       DEGRADED     0     0     0
  sda                          ONLINE       0     0     0
  sdb                          ONLINE       0     0     0
  sdc                          ONLINE       0     0     0
  spare-3                      DEGRADED     0     0     0
    sdd                        FAULTED     34     0     0  too many errors
    sdf                        ONLINE       0     0     0
  sde                          ONLINE       0     0     0
spares
  sdf                          INUSE     currently in use

But Zabbix did not trigger because the script failed with this:

# /etc/zabbix/scripts/zfs.py
Traceback (most recent call last):
  File "/etc/zabbix/scripts/zfs.py", line 155, in <module>
    scrub, vdev_errors = pool_status()
  File "/etc/zabbix/scripts/zfs.py", line 78, in pool_status
    vdev_errors = {x[0]: {
  File "/etc/zabbix/scripts/zfs.py", line 79, in <dictcomp>
    'read': int(x[2]),
ValueError: invalid literal for int() with base 10: 'currently'

https://github.com/blind-oracle/zabbix-zfs/blob/18d396c9c0148881c70f506f37939ad5f8827396/zfs.py#L78-L82

'r' contains the following and will fail to convert to int(): [... ['/dev/sde1', 'ONLINE', '0', '0', '0'] ['spares'], ['/dev/sdf1', 'INUSE', 'currently', 'in', 'use'] ...]

Would be great if someone could fix that.

Best regards

blind-oracle commented 3 years ago

Yeah I've never tested it with spares as I don't use them :) Will try to adapt the script

blind-oracle commented 3 years ago

I think I've fixed it by skipping whatever cannot be converted, please check

lonnie44 commented 3 years ago

Still fails but different:

Traceback (most recent call last):
File "/etc/zabbix/scripts/zfs.py", line 170, in <module>
'vdevs': vdev_list(vdev_errors),
File "/etc/zabbix/scripts/zfs.py", line 106, in vdev_list
return {x[0]: {
File "/etc/zabbix/scripts/zfs.py", line 114, in <dictcomp>
'errors': errors[x[0]],
KeyError: '/dev/sdd1'

I think this is the issue, because of the text at the end 'too many errors'. This should have been a issue before as well. https://github.com/blind-oracle/zabbix-zfs/blob/0779045794f67a41c8b7d3aa17c78c5e9c04eb29/zfs.py#L83-L85

lonnie44 commented 3 years ago

Fixed it, it is working now: I replaced

if len(x) != 5 or not x[0].startswith('/'):

with

if len(x) < 5 or not x[0].startswith('/'):

Since all convert errors are skipped I guess it is fine.

Thanks a lot!

blind-oracle commented 3 years ago

Ok, good. I'll try to convert the script at some point to use zfs program with lua and JSON, it will probably make the implementation stable and not dependant on the zpool/zfs output