NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
176 stars 94 forks source link

ZeroDivisionError: float division by zero #769

Open tgriep opened 3 years ago

tgriep commented 3 years ago

When trying to monitor the /boot partition of a Linux system, this error is generated in the ncpa_listener.log file 2021-05-12 19:40:21,714 521180 ERROR float division by zero Traceback (most recent call last): File "/root/ncpa/agent/listener/psapi.py", line 257, in get_root_node File "/root/ncpa/agent/listener/psapi.py", line 211, in get_disk_node File "/root/ncpa/agent/listener/psapi.py", line 64, in make_mountpoint_nodes ZeroDivisionError: float division by zero

And the API disk check for that partition displays this error. { "error": { "node": "disk", "path": "/api/disk", "message": "The node requested does not exist.", "code": 100 } }

In the NCPA agent, this following line is what is generating the error. https://github.com/NagiosEnterprises/ncpa/blob/master/agent/listener/psapi.py#L64

iu = st.f_files - st.f_ffree this line should make iu = 0 or < 0 if the st.f_files is 0, and then this line should not run if that's the case: if iu > 0: iup = math.ceil(100 * float(iu) / float(st.f_files))

but somehow it is running anyway and is causing the errors.

See this ticket. https://support.nagios.com/tickets/scp/tickets.php?id=14242

ccztux commented 3 years ago

Can you provide details about the affected partition? For example the output of the following df commands:

[root@centos7-01 src]# df -ih /boot/
Filesystem     Inodes IUsed IFree IUse% Mounted on
/dev/sda1        512K   355  512K    1% /boot

[root@centos7-01 src]# df -ahHT /boot/
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/sda1      xfs   1.1G  322M  742M  31% /boot
tgriep commented 3 years ago

I asked the customer to run those commands. As soon as I get the data, I'll update it here.

tgriep commented 3 years ago

Here is the data. df -ih /boot/ Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda1 398K 349 398K 1% /boot

df -ahHT /boot/ Filesystem Type Size Used Avail Use% Mounted on /dev/sda1 xfs 411M 273M 139M 67% /boot

ccztux commented 3 years ago

I can reproduce this issue on my testserver. On my server the /proc/fs/nfsd filesystem is causing this issue.

This filesystem has no inodes:

[root@cl01 build]# df -i /proc/fs/nfsd
Filesystem     Inodes IUsed IFree IUse% Mounted on
nfsd                0     0     0     - /proc/fs/nfsd

As workaround i have added the nfsd filesystem to the exclude_fs_types in the ncpa.cfg and have restarted the ncpa_listener service.

Can you please check if you have a filesystem which has no inodes and is not listed in the exclude_fs_types directive of the ncpa.cfg?:

[root@cl01 build]# df -ia
Filesystem                           Inodes  IUsed   IFree IUse% Mounted on
sysfs                                     0      0       0     - /sys
proc                                      0      0       0     - /proc
devtmpfs                             232288    378  231910    1% /dev
securityfs                                0      0       0     - /sys/kernel/security
tmpfs                                235239      1  235238    1% /dev/shm
devpts                                    0      0       0     - /dev/pts
tmpfs                                235239    515  234724    1% /run
tmpfs                                235239     16  235223    1% /sys/fs/cgroup
cgroup                                    0      0       0     - /sys/fs/cgroup/systemd
pstore                                    0      0       0     - /sys/fs/pstore
cgroup                                    0      0       0     - /sys/fs/cgroup/perf_event
cgroup                                    0      0       0     - /sys/fs/cgroup/cpu,cpuacct
cgroup                                    0      0       0     - /sys/fs/cgroup/devices
cgroup                                    0      0       0     - /sys/fs/cgroup/net_cls,net_prio
cgroup                                    0      0       0     - /sys/fs/cgroup/memory
cgroup                                    0      0       0     - /sys/fs/cgroup/freezer
cgroup                                    0      0       0     - /sys/fs/cgroup/blkio
cgroup                                    0      0       0     - /sys/fs/cgroup/hugetlb
cgroup                                    0      0       0     - /sys/fs/cgroup/pids
cgroup                                    0      0       0     - /sys/fs/cgroup/cpuset
configfs                                  0      0       0     - /sys/kernel/config
/dev/mapper/centos_centos7--01-root 8910848 198245 8712603    3% /
selinuxfs                                 0      0       0     - /sys/fs/selinux
systemd-1                                 -      -       -     - /proc/sys/fs/binfmt_misc
mqueue                                    0      0       0     - /dev/mqueue
hugetlbfs                                 0      0       0     - /dev/hugepages
debugfs                                   0      0       0     - /sys/kernel/debug
nfsd                                      0      0       0     - /proc/fs/nfsd
/dev/sda1                            524288    356  523932    1% /boot
sunrpc                                    0      0       0     - /var/lib/nfs/rpc_pipefs
binfmt_misc                               0      0       0     - /proc/sys/fs/binfmt_misc
tmpfs                                235239      1  235238    1% /run/user/0
tgriep commented 3 years ago

The customer has a partition that has the inodes full so that is the issue. Filesystem Inodes IUsed IFree IUse% Mounted on nfs:/unix 262615775312 262506836208 108939104 100% /data

They cannot exclude the file system types as they are checking other partitions of the same type.

ccztux commented 3 years ago

Thank you for providing these helpful infos. I will investigate further and get back to you.

ccztux commented 3 years ago

I have no idea, whats going wrong here. How often does the exception occure? Does any check on the disk node work, if the exception occures?

I can only tell you, that the fix in #775 definitely fixes the issue on my testserver. Which operating system and version is in use? Maybe i can provide you an unofficial build including the fix if you like?

tgriep commented 3 years ago

The operating they are running is the following. Red Hat Enterprise Linux Server release 7.6 (Maipo)

The ERROR float division by zero error in the ncpa_listener.log file happens on every disk call in the system and the error for the /boot partition is constant.

But as far as I know, all of the disk checks are failing.

If you can provide an unofficial build, I can send that to the user and have them test it.

ccztux commented 3 years ago

RHEL 7 sounds great, because i work on such a system and can provide you the unofficial build. You can find the build here I hope, that the testbuild fixes the issue. Please let us know, if it does or not.

ccztux commented 3 years ago

Please let me know, when you have downloaded the testbuild, because i want to remove it as soon as possible. Thank you.

tgriep commented 3 years ago

I do not have access to the users system to install the updated package but I will let them know that about the updated RPM. I'll update the issue on the results of the new package.