NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
177 stars 95 forks source link

ncpa hangs if mounted nfs partition has a problems #757

Open skhvorov opened 3 years ago

skhvorov commented 3 years ago

The problem detected in 2.3.0 and 2.3.1 versions. Tested on RHEL 7.6.

Important configuration part:

~# cat /usr/local/ncpa/etc/ncpa.cfg ... all_partitions = 0

exclude_fs_types = aufs,autofs,binfmt_misc,cifs,cgroup,configfs,debugfs,devpts,devtmpfs,encryptfs,efivarfs,fuse,fusectl,hugetlbfs,mqueue,nfs,nfs4,overlayfs,proc,pstore,rpc_pipefs,securityfs,selinuxfs,smb,sysfs,tmpfs,tracefs ...

Strace output for ncpa-2.2.2-1.el7.x86_64:

[pid 382304] open("/proc/382304/mounts", O_RDONLY|O_CLOEXEC) = 10 [pid 382304] fstat(10, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 [pid 382304] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2b3522a000 [pid 382304] read(10, "rootfs / rootfs rw 0 0\nsysfs /sy"..., 1024) = 1024 [pid 382304] read(10, "odev,noexec,relatime,memory 0 0\n"..., 1024) = 1024 [pid 382304] read(10, ".cde.abc.com/public /mnt/publi"..., 1024) = 1024 [pid 382304] read(10, ".25.23 0 0\nauto.nfs.cde /nfs/cde"..., 1024) = 1024 [pid 382304] read(10, "65536,wsize=65536,namlen=255,har"..., 1024) = 1024 [pid 382304] read(10, "e,vers=3,rsize=65536,wsize=65536"..., 1024) = 1024 [pid 382304] read(10, "imeo=600,retrans=2,sec=sys,mount"..., 1024) = 135 [pid 382304] read(10, "", 1024) = 0 [pid 382304] close(10) = 0 [pid 382304] munmap(0x7f2b3522a000, 4096) = 0 [pid 382304] stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 [pid 382304] statfs("/", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=25770312, f_bfree=22581085, f_bavail=21270365, f_files=6553600, f_ffree=6208617, f_fsid={3603953923, 2787610972}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 382304] stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 [pid 382304] stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 [pid 382304] statfs("/", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=25770312, f_bfree=22581085, f_bavail=21270365, f_files=6553600, f_ffree=6208617, f_fsid={3603953923, 2787610972}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 382304] stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 [pid 382304] stat("/localdisk", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 382304] statfs("/localdisk", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=30819583, f_bfree=30804304, f_bavail=29237085, f_files=7839744, f_ffree=7839733, f_fsid={573879092, 883839171}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 382304] stat("/localdisk", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0

Here we can see /proc/382304/mounts reading to obtain mounts list and stat+statfs calls for physical disk mounts.

After upgrade to 2.3.0:

[pid 384059] open("/proc/384059/mounts", O_RDONLY|O_CLOEXEC) = 10 [pid 384059] futex(0x7ff277391570, FUTEX_WAKE_PRIVATE, 2147483647) = 0 [pid 384059] fstat(10, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 [pid 384059] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff277ed7000 [pid 384059] read(10, "rootfs / rootfs rw 0 0\nsysfs /sy"..., 1024) = 1024 [pid 384059] read(10, "odev,noexec,relatime,memory 0 0\n"..., 1024) = 1024 [pid 384059] read(10, ".cde.abc.com/public /mnt/publi"..., 1024) = 1024 [pid 384059] read(10, ".25.23 0 0\nauto.nfs.cde /nfs/cde"..., 1024) = 1024 [pid 384059] read(10, "65536,wsize=65536,namlen=255,har"..., 1024) = 1024 [pid 384059] read(10, "e,vers=3,rsize=65536,wsize=65536"..., 1024) = 1024 [pid 384059] read(10, "imeo=600,retrans=2,sec=sys,mount"..., 1024) = 135 [pid 384059] read(10, "", 1024) = 0 [pid 384059] close(10) = 0 [pid 384059] munmap(0x7ff277ed7000, 4096) = 0 [pid 384059] statfs("/", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=25770312, f_bfree=22580995, f_bavail=21270275, f_files=6553600, f_ffree=6208607, f_fsid={3603953923, 2787610972}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/sys", {f_type=SYSFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/proc", {f_type=PROC_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/dev", {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=24558483, f_bfree=24558483, f_bavail=24558483, f_files=24558483, f_ffree=24557731, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID}) = 0 [pid 384059] statfs("/sys/kernel/security", {f_type=SECURITYFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/dev/shm", {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=24563156, f_bfree=24563156, f_bavail=24563156, f_files=24563156, f_ffree=24563155, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV}) = 0 [pid 384059] statfs("/dev/pts", {f_type=DEVPTS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/run", {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=24563156, f_bfree=24558275, f_bavail=24558275, f_files=24563156, f_ffree=24561859, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV}) = 0 [pid 384059] statfs("/sys/fs/cgroup", {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=24563156, f_bfree=24563156, f_bavail=24563156, f_files=24563156, f_ffree=24563140, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RDONLY|ST_NOSUID|ST_NODEV|ST_NOEXEC}) = 0 [pid 384059] statfs("/sys/fs/cgroup/systemd", {f_type=CGROUP_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/fs/pstore", {f_type=PSTOREFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/fs/cgroup/perf_event", {f_type=CGROUP_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/fs/cgroup/devices", {f_type=CGROUP_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/fs/cgroup/freezer", {f_type=CGROUP_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/fs/cgroup/memory", {f_type=CGROUP_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/fs/cgroup/blkio", {f_type=CGROUP_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/fs/cgroup/net_cls,net_prio", {f_type=CGROUP_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/fs/cgroup/pids", {f_type=CGROUP_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/fs/cgroup/cpuset", {f_type=CGROUP_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/fs/cgroup/hugetlb", {f_type=CGROUP_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/fs/cgroup/cpu,cpuacct", {f_type=CGROUP_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/kernel/config", {f_type=0x62656570, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=25770312, f_bfree=22580995, f_bavail=21270275, f_files=6553600, f_ffree=6208607, f_fsid={3603953923, 2787610972}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/proc/sys/fs/binfmt_misc", {f_type=BINFMTFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/dev/mqueue", {f_type=0x19800202, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/sys/kernel/debug", {f_type=DEBUGFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/dev/hugepages", {f_type=HUGETLBFS_MAGIC, f_bsize=2097152, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=2097152, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/localdisk", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=30819583, f_bfree=30804304, f_bavail=29237085, f_files=7839744, f_ffree=7839733, f_fsid={573879092, 883839171}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/var/lib/nfs/rpc_pipefs", {f_type=0x67596969, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/mnt/public", {f_type=0xfe534d42, f_bsize=1024, f_blocks=95990540, f_bfree=17129428, f_bavail=17129428, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=4096, f_frsize=1024, f_flags=ST_VALID|ST_RDONLY|ST_RELATIME}) = 0 [pid 384059] statfs("/misc", {f_type=AUTOFS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/net", {f_type=AUTOFS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs", {f_type=AUTOFS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/proc/sys/fs/binfmt_misc", {f_type=BINFMTFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/site", {f_type=AUTOFS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/site/home", {f_type=NFS_SUPER_MAGIC, f_bsize=65536, f_blocks=12582924, f_bfree=5233788, f_bavail=5233788, f_files=24999997, f_ffree=13904851, f_fsid={0, 0}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/cde", {f_type=AUTOFS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/cde/disks", {f_type=AUTOFS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/cde/disks/home_p", {f_type=NFS_SUPER_MAGIC, f_bsize=65536, f_blocks=163853, f_bfree=151395, f_bavail=151395, f_files=384317, f_ffree=351677, f_fsid={0, 0}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/site/itools", {f_type=NFS_SUPER_MAGIC, f_bsize=65536, f_blocks=10829830, f_bfree=4033425, f_bavail=4033425, f_files=21251126, f_ffree=8454876, f_fsid={0, 0}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID|ST_RDONLY|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/site/gen", {f_type=AUTOFS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/site/gen/adm", {f_type=NFS_SUPER_MAGIC, f_bsize=65536, f_blocks=16388, f_bfree=16307, f_bavail=16307, f_files=36629, f_ffree=36478, f_fsid={0, 0}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/cde/disks/gen_adm_sudo", {f_type=NFS_SUPER_MAGIC, f_bsize=65536, f_blocks=3210, f_bfree=3169, f_bavail=3169, f_files=7169, f_ffree=7071, f_fsid={0, 0}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/cde/proj", {f_type=AUTOFS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/cde/dfg", {f_type=AUTOFS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/cde/dfg/pdsd", {f_type=NFS_SUPER_MAGIC, f_bsize=65536, f_blocks=6553600, f_bfree=3322739, f_bavail=3322739, f_files=12451833, f_ffree=8955768, f_fsid={0, 0}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/cde/dfg/users", {f_type=NFS_SUPER_MAGIC, f_bsize=65536, f_blocks=25165835, f_bfree=4337460, f_bavail=4337460, f_files=27022807, f_ffree=7624604, f_fsid={0, 0}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/cde/disks/nn-qwe", {f_type=NFS_SUPER_MAGIC, f_bsize=65536, f_blocks=100663305, f_bfree=14931185, f_bavail=14931185, f_files=76999991, f_ffree=37114584, f_fsid={0, 0}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/site/gen/platforms", {f_type=AUTOFS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/nfs/site/gen/platforms/qwe", {f_type=NFS_SUPER_MAGIC, f_bsize=65536, f_blocks=24576003, f_bfree=519306, f_bavail=519306, f_files=31876696, f_ffree=29732750, f_fsid={0, 0}, f_namelen=255, f_frsize=65536, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] statfs("/mnt/local-nfs", {f_type=NFS_SUPER_MAGIC, f_bsize=1048576, f_blocks=2533920, f_bfree=2432885, f_bavail=2304117, f_files=164823040, f_ffree=164314884, f_fsid={0, 0}, f_namelen=255, f_frsize=1048576, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 [pid 384059] statfs("/", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=25770312, f_bfree=22580995, f_bavail=21270275, f_files=6553600, f_ffree=6208607, f_fsid={3603953923, 2787610972}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 [pid 384059] stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 [pid 384059] statfs("/", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=25770312, f_bfree=22580995, f_bavail=21270275, f_files=6553600, f_ffree=6208607, f_fsid={3603953923, 2787610972}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 [pid 384059] stat("/localdisk", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 384059] statfs("/localdisk", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=30819583, f_bfree=30804304, f_bavail=29237085, f_files=7839744, f_ffree=7839733, f_fsid={573879092, 883839171}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 [pid 384059] stat("/localdisk", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0

Statfs calls for ALL mounts has been added to output between mount list reading and normal processing of physical disk mounts after upgrade.

If at least one NFS mount has a problem ncpa_listener will hang.

ccztux commented 3 years ago

Are there some informations regarding this issue in the ncpa_listener.log? Do you know a way how to reproduce the nfs share issue for analyzing this issue?

ccztux commented 3 years ago

Is there really a newline after the equal sign?:

exclude_fs_types = 

aufs,autofs,binfmt_misc,cifs,cgroup,configfs,debugfs,devpts,devtmpfs,encryptfs,efivarfs,fuse,fusectl,hugetlbfs,mqueue,nfs,nfs4,overlayfs,proc,pstore,rpc_pipefs,securityfs,selinuxfs,smb,sysfs,tmpfs,tracefs
ccztux commented 3 years ago

I cant reproduce this issue with a "stale file handle error" on a mounted nfs share:

[root@centos7_ncpa_test src]# mount -l | grep test
10.0.0.1:/tmp on /tmp/test type nfs4 (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.85,local_lock=none,addr=10.0.0.1)

[root@centos7_ncpa_test src]# ls -alh /tmp/test/
ls: cannot open directory /tmp/test/: Stale file handle
skhvorov commented 3 years ago

Is there really a newline after the equal sign?:

exclude_fs_types = 

aufs,autofs,binfmt_misc,cifs,cgroup,configfs,debugfs,devpts,devtmpfs,encryptfs,efivarfs,fuse,fusectl,hugetlbfs,mqueue,nfs,nfs4,overlayfs,proc,pstore,rpc_pipefs,securityfs,selinuxfs,smb,sysfs,tmpfs,tracefs

No;

image

skhvorov commented 3 years ago

Are there some informations regarding this issue in the ncpa_listener.log?

There is no something interesting in the log. Log writing just stops.

Do you know a way how to reproduce the nfs share issue for analyzing this issue?

This works for me: iptables -A OUTPUT -d NFS_SERVER_IP -j DROP

Be careful, if you use nfs server for remote connection to centos7_ncpa_test, you will lost remote connection channel.

Blocking rule can be removed with: iptables -D OUTPUT -d NFS_SERVER_IP -j DROP

ccztux commented 3 years ago

Is there really a newline after the equal sign?:

exclude_fs_types = 

aufs,autofs,binfmt_misc,cifs,cgroup,configfs,debugfs,devpts,devtmpfs,encryptfs,efivarfs,fuse,fusectl,hugetlbfs,mqueue,nfs,nfs4,overlayfs,proc,pstore,rpc_pipefs,securityfs,selinuxfs,smb,sysfs,tmpfs,tracefs

No;

image

Thanks for providing this info.

ccztux commented 3 years ago

Are there some informations regarding this issue in the ncpa_listener.log?

There is no something interesting in the log. Log writing just stops.

Do you know a way how to reproduce the nfs share issue for analyzing this issue?

This works for me: iptables -A OUTPUT -d NFS_SERVER_IP -j DROP

Be careful, if you use nfs server for remote connection to centos7_ncpa_test, you will lost remote connection channel.

Blocking rule can be removed with: iptables -D OUTPUT -d NFS_SERVER_IP -j DROP

Thanks for providing this info, now i can reproduce this issue. :) On my test system this issue also occurs with NCPA 2.2.2. I will take a look on it.

skhvorov commented 3 years ago

On my test system this issue also occurs with NCPA 2.2.2.

Did you test with these parameters?

all_partitions = 0

exclude_fs_types = aufs,autofs,binfmt_misc,cifs,cgroup,configfs,debugfs,devpts,devtmpfs,encryptfs,efivarfs,fuse,fusectl,hugetlbfs,mqueue,nfs,nfs4,overlayfs,proc,pstore,rpc_pipefs,securityfs,selinuxfs,smb,sysfs,tmpfs,tracefs

I have checked again for NCPA 2.2.2 configured as above and I can't see the problem. Moreover, strace doesn't contain any stat or statfs calls to NFS mounts.

ccztux commented 3 years ago

On my test system this issue also occurs with NCPA 2.2.2.

Did you test with these parameters?

all_partitions = 0

exclude_fs_types = aufs,autofs,binfmt_misc,cifs,cgroup,configfs,debugfs,devpts,devtmpfs,encryptfs,efivarfs,fuse,fusectl,hugetlbfs,mqueue,nfs,nfs4,overlayfs,proc,pstore,rpc_pipefs,securityfs,selinuxfs,smb,sysfs,tmpfs,tracefs

I have checked again for NCPA 2.2.2 configured as above and I can't see the problem. Moreover, strace doesn't contain any stat or statfs calls to NFS mounts.

Yes, you are right. Using your configuration solves the issue for NCPA 2.2.2. Thank you.

ccztux commented 3 years ago

Good news, i have found and fixed the issue, but i dont understand why this works with NCPA 2.2.2 and not with NCPA 2.3.x. I will investigate further and provide the fix via a pull request.

ccztux commented 3 years ago

The configuration directive all_partitions has not worked. This is fixed. Not fixed is the issue, that ncpa hangs and did not work any more, if there is an issue with a nfs share for example.

This seems to be an issue with psutil:

[root@centos7-02 psutil]# source bin/activate
(psutil) [root@centos7-02 psutil]# python
Python 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import psutil
>>> psutil.disk_partitions(all=True)            # the mountpoint /tmp/mountpoint has an issue
[root@centos7-02 psutil]# source bin/activate
(psutil) [root@centos7-02 psutil]# python
Python 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import psutil
>>> psutil.disk_usage('/tmp/mountpoint')            # this mountpoint has an issue

In both cases i have to hardkill the python process, because they never came back.

ccztux commented 3 years ago

I have opened an issue in the psutil project. Maybe they can help us with this issue.

vappukuttan commented 2 years ago

Hello,

I have been having this issue periodically. Is there any ETA on this issue.

We currently use Nagios XI 5.8.6 with ncpa-listener (ncpa 3.0.8) for monitoring.

If there is an issue with an NFS mountpoint, a lot of service checks end up with "UNKNOWN: Execution exceeded timeout threshold of 60s".

Thank you, Vinod

ccztux commented 2 years ago

I dont know the release date of 2.4.0, but i found this:

791 (comment)

It would be nice to know the ETA for 2.4.0