OpenMediaVault-Plugin-Developers / openmediavault-autoshutdown

OpenMediaVault plugin to provide frontend for autoshutdown script
35 stars 30 forks source link

HDD IO check not working for disks in ZFS pools #132

Open Fate6174 opened 2 months ago

Fate6174 commented 2 months ago

The function _check_hddio does not take disks in ZFS pools into account. In my case, the iostat command in lines 973-975

done < <(LC_ALL=C iostat -kdyNz -o JSON |
          jq -r '.sysstat.hosts[].statistics[].disk[] |
                 "\(.disk_device) \(.kB_read) \(.kB_wrtn)"');

gives IO information for disks sda to sdf. Those are looked up in the output of mount -l in line 905, but are not found, and so the check stops with the log message "DEBUG: Skipping as no mount point". This is because disks in ZFS pools do not show up in the output of mount -l, only the ZFS pool names. This leads to autoshutdown suspending my server, even though some ZFS resilvers or scrubs are running (which have more than enough disk IO).

For now, I have just commented out the mount point check in lines 905-907 and it correctly works in my setup. But I am not sure what the best general solution would be.

Fate6174 commented 2 months ago

Answering my second bullet point, I think one could extend the check in lines 905-907 by changing it from

! mount -l | grep -q "${hdd}" && {
        _log "DEBUG: Skipping as no mount point"
        continue; }

to

 ! mount -l | grep -q "${hdd}" &&
    command -v zpool >/dev/null 2>&1 &&
    ! ZPOOL_SCRIPTS_AS_ROOT=1 zpool status -c upath | grep "${hdd}" | grep -q -e "ONLINE" -e "DEGRADED" && {
        _log "DEBUG: Skipping as no mount point"
        continue; }
Line Explanation
1 Test if ${hdd} is mounted normally (same as before). If NOT, go to line 2.
2 Test if the zpool command is available. If YES, go to line 3.
3 Test if ${hdd} appears as a disk with status "ONLINE" or "DEGRADED" in the output of zpool status -c upath. If NOT, go to line 4. The ZPOOL_SCRIPTS_AS_ROOT=1 variable is needed if autoshutdown is run with root privileges (I don't know if that is the case). If not, it can be omitted.
4 Log that the disk is not mounted (same as before).
5 Continue the loop (same as before).

Thoughts?