NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
176 stars 95 forks source link

Hung hard NFS mounts breaks the listener #997

Closed jd-daniels closed 9 months ago

jd-daniels commented 9 months ago

ncpa-2.4.1-1

I saw a couple references to this in closed issues, but the provided fix doesn't seem to work, at least for me. When you have an NFS mount that is mounted 'hard' and it gets hung. NCPA will stop replying to all checks. I can produce this simply by hard mounting NFS, then blocking the IP of the NFS server. Then all checks fail with exceeded timeouts.

I have already excluded nfs and nfs4 in my ncpa.cfg. nfs4 is what this particular mount shows up as.

exclude_fs_types = "aufs, autofs, binfmt_misc, cifs, cgroup, configfs, debugfs, devpts, devtmpfs, encryptfs efivarfs, fuse, hugetlbfs, mqueue, nfs, overlayfs, proc, pstore, rpc_pipefs, securityfs, selinuxfs, smb, sysfs, tmpfs, tracefs, nfs4"

Tried the above with/without quotes, also without spaces between commas.

In fact, when I try to bounce the listener in this state, I have to send a SIGKILL because it's stuck in uninterruptible sleep state. Systemctl won't restart it.

nagios 4903 0.0 0.6 215348 49656 ? Dsl 09:33 0:00 /usr/local/ncpa/ncpa_listener -n

With an strace on the ncpa_listener, I can see it trying to stat the hung mountpoint: [pid 5347] statfs("/mnt/tmp",

MrPippin66 commented 9 months ago

Do you have "all_partitions" option set to 0?