NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
176 stars 95 forks source link

NCPA 3.0.1 disk checks on Red Hat Linux 8 (not on 9, not tested on RHEL7) #1081

Open fabiuseur opened 6 months ago

fabiuseur commented 6 months ago

Hi,

We have updated the Nagios NCPA agent for Linux to the latest version 3.0.1. Now the disks checks aren't working anymore. If I do a rollback to previous version 3.0.0 is working properly.

root@nagiosserver~ # /usr/local/nagios/libexec/check_ncpa.py --version
check_ncpa.py, Version 1.2.4

root@nagiosserver ~ # /usr/local/nagios/libexec/check_ncpa.py -H <linux_host> -t <very_secure_token> -P 5693 -M 'disk|/logical/|data' -w 85 -c 90
UNKNOWN: The node (disk/) requested does not exist.

root@linux_client /usr/local/ncpa # rpm -qa | grep -i ncpa
ncpa-3.0.1-latest.x86_64

root@linux-server /usr/local/ncpa # dnf downgrade ncpa
Updating Subscription Management repositories.
Last metadata expiration check: 1:52:45 ago on di 19 dec 2023 08:12:02 CET.
Dependencies resolved.
============================================================================================================================================================================================================
 Package                                      Architecture                                   Version                                              Repository                                           Size
============================================================================================================================================================================================================
Downgrading:
 ncpa                                         x86_64                                         3.0.0-latest                                         nagios_base                                          26 M

Transaction Summary
============================================================================================================================================================================================================
Downgrade  1 Package

Total download size: 26 M
Is this ok [y/N]: y
Downloading Packages:
ncpa-3.0.0-latest.x86_64.rpm                                                                                                                                                135 MB/s |  26 MB     00:00    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                       135 MB/s |  26 MB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                                    1/1 
  Running scriptlet: ncpa-3.0.0-latest.x86_64                                                                                                                                                           1/1 
  Running scriptlet: ncpa-3.0.0-latest.x86_64                                                                                                                                                           1/2 
Try to stop services with chkconfig
error reading information on service ncpa_listener: No such file or directory
error reading information on service ncpa_passive: No such file or directory
Try to stop services with systemctl
Try to stop services with service

  Downgrading      : ncpa-3.0.0-latest.x86_64                                                                                                                                                           1/2 
  Running scriptlet: ncpa-3.0.0-latest.x86_64                                                                                                                                                           1/2 
  Running scriptlet: ncpa-3.0.1-latest.x86_64                                                                                                                                                           2/2 
  Cleanup          : ncpa-3.0.1-latest.x86_64                                                                                                                                                           2/2 
  Running scriptlet: ncpa-3.0.1-latest.x86_64                                                                                                                                                           2/2 
  Running scriptlet: ncpa-3.0.0-latest.x86_64                                                                                                                                                           2/2 
  Running scriptlet: ncpa-3.0.1-latest.x86_64                                                                                                                                                           2/2 
  Verifying        : ncpa-3.0.0-latest.x86_64                                                                                                                                                           1/2 
  Verifying        : ncpa-3.0.1-latest.x86_64                                                                                                                                                           2/2 
Installed products updated.
Last metadata expiration check: 1:52:52 ago on di 19 dec 2023 08:12:02 CET.

Downgraded:
  ncpa-3.0.0-latest.x86_64                                                                                                                                                                                  

Complete!
root@linuxserver /usr/local/ncpa # systemctl restart ncpa

root@nagiosserver ~ # /usr/local/nagios/libexec/check_ncpa.py -H <linux_client> -t '<very_secure_token>' -P 5693 -M 'disk/logical/|data' -w 85 -c 90
OK: Used disk space was 50.10 % (Total: 199.99 GiB, Used: 100.17 GiB, Free: 99.82 GiB) | 'total'=199.99GiB;;; 'used'=100.17GiB;;; 'free'=99.82GiB;;;

#Error in the ncpa_listener.log:
2023-12-19 10:09:06,733 root ERROR cannot access local variable 'node_children' where it is not associated with a value
Traceback (most recent call last):
  File "listener/psapi.py", line 399, in get_root_node
  File "listener/psapi.py", line 343, in get_disk_node
  File "listener/psapi.py", line 139, in make_mountpoint_nodes
UnboundLocalError: cannot access local variable 'node_children' where it is not associated with a value
2023-12-19 10:09:06,734 geventwebsocket.handler INFO ::ffff:10.210.11.9 - - [2023-12-19 10:09:06] "GET /api/disk/logical/%7Cdata/?token=<very_secure_token>&warning=85&critical=90&check=1 HTTP/1.1" 200 410 0.022027

on RHEL9 I don't have this issue and the other service checks on the RHEL8 host is working fine, like CPU usage and memory usage.

fabiuseur commented 6 months ago

Update: I saw sometimes on RHEL9 the same issue as on RHEL8.

2023-12-19 14:18:12,779 listener INFO before_request() - request.url: https://<linux_client>:5693/api/disk/logical/|data/?token=<very_secret_token>&warning=85&critical=90&check=1
Traceback (most recent call last):
 File "listener/psapi.py", line 90, in make_mountpoint_nodes
2023-12-19 14:18:13,032 root ERROR cannot access local variable 'node_children' where it is not associated with a value
Traceback (most recent call last):
  File "listener/psapi.py", line 399, in get_root_node
  File "listener/psapi.py", line 343, in get_disk_node
  File "listener/psapi.py", line 139, in make_mountpoint_nodes
UnboundLocalError: cannot access local variable 'node_children' where it is not associated with a value
ne-bbahn commented 6 months ago

From your example, you were trying to access different API endpoints. 'disk|/logical/|data' in your 3.0.1 example (should be disk/logical/|data) disk/logical/|data in your 3.0.0 example

First, verify that you have a /disk/logical/|data endpoint. You can verify these in the NCPA UI under API. These endpoints are generated automatically using the disk nodes found with the psutil library. I have tried these checks on RHEL8/9 and NCPA 3.0.1 and it's working just fine for me.

If you're certain that there is a |data partition and NCPA can't detect it, then it may be a bug with the psutil library or our handling of the returned content from that library.

ne-bbahn commented 2 months ago

Could you give more information on how your disk is mounted?