NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
177 stars 94 forks source link

NCPA agent suddenly stopped on solaris 11.4 SRU 59 #996

Open xera13 opened 11 months ago

xera13 commented 11 months ago

Dear expert,

Kindly am in need of help. We upgrading our solaris from 11.4 SRU 55 to SRU 59. After the upgrade, the ncpa agent become unstable. ncpa_listener able to start but after some time, the agent stop working. Try to downgrade to version 2.3.1, agent unable to start. Reinstall the agent 2.4.0, but services down after few some time..

Thank you.

MrPippin66 commented 11 months ago

Do you have any system logs indicating agent stopping? What do NCPA logs show?

amroll2627 commented 5 months ago

Hi, I want to report the same with the issue. The NCPA agent seems down after some time. Downgrading to 2.3.1 seems not working I think because it use python 2.7 which is no longer supported on Solaris. https://blogs.oracle.com/solaris/post/sunsetting-python-2-on-oracle-solaris

The process seems killed or gone by itself. No log indicated. Let me know what other info can I gather.

root@serverhostname # cat /etc/release Oracle Solaris 11.4 SPARC Copyright (c) 1983, 2024, Oracle and/or its affiliates. Assembled 14 February 2024 root@serverhostname # pkg info entire Name: entire Summary: entire incorporation including Support Repository Update (Oracle Solaris 11.4.66.164.1). Description: This package constrains system package versions to the same build. WARNING: Proper system update and correct package selection depend on the presence of this incorporation. Removing this package will result in an unsupported system. For more information see: https://support.oracle.com/rs?type=doc&id=2433412.1 Category: Meta Packages/Incorporations State: Installed Publisher: solaris Version: 11.4 (Oracle Solaris 11.4.66.164.1) Branch: 11.4.66.0.1.164.1 Packaging Date: February 15, 2024 at 4:00:20 AM Last Install Time: June 5, 2020 at 7:53:52 PM Last Update Time: April 19, 2024 at 5:31:48 PM Size: 2.52 kB FMRI: pkg://solaris/entire@11.4-11.4.66.0.1.164.1:20240215T040020Z root@serverhostname # root@serverhostname # svcs -a | grep ncpa online 2024-04-24T17:48:48 svc:/site/ncpa_passive:default online 2024-04-24T17:50:49 svc:/site/ncpa_listener:default root@serverhostname # ps -ef | grep ncpa nagios 12964 1 0 Apr 24 ? 2:25 /usr/local/ncpa/ncpa_passive root 5594 5572 0 15:55:30 pts/8 0:00 grep ncpa root@serverhostname # svcs -xv /site/ncpa_listener svc:/site/ncpa_listener:default (site/ncpa_listener) State: online since 2024-04-24T17:50:49 See: /var/svc/log/site-ncpa_listener:default.log Impact: None. root@serverhostname # tail -30 /var/svc/log/site-ncpa_listener:default.log [ 2024 Apr 24 17:45:59 Method "start" exited with status 0. ] [ 2024 Apr 24 17:47:14 Rereading configuration. ] [ 2024 Apr 24 17:47:14 Executing refresh method (:true). ] [ 2024 Apr 24 17:47:23 Stopping because service disabled. ] [ 2024 Apr 24 17:47:23 Executing stop method ("kill $(cat /usr/local/ncpa/var/run/ncpa_listener.pid)"). ] [ 2024 Apr 24 17:47:23 Method "stop" exited with status 0. ] [ 2024 Apr 24 17:47:30 Enabled. ] [ 2024 Apr 24 17:47:30 Executing start method ("/usr/local/ncpa/ncpa_listener"). ] [ 2024 Apr 24 17:47:31 Method "start" exited with status 0. ] [ 2024 Apr 24 17:48:45 Stopping because service disabled. ] [ 2024 Apr 24 17:48:45 Executing stop method ("kill $(cat /usr/local/ncpa/var/run/ncpa_listener.pid)"). ] [ 2024 Apr 24 17:48:45 Method "stop" exited with status 0. ] [ 2024 Apr 24 17:48:48 Enabled. ] [ 2024 Apr 24 17:48:48 Executing start method ("/usr/local/ncpa/ncpa_listener"). ] [ 2024 Apr 24 17:48:49 Method "start" exited with status 0. ] [ 2024 Apr 24 17:50:21 Stopping because service disabled. ] [ 2024 Apr 24 17:50:21 Executing stop method ("kill $(cat /usr/local/ncpa/var/run/ncpa_listener.pid)"). ] kill: 12966: no such process [ 2024 Apr 24 17:50:21 Method "stop" exited with status 1. ] [ 2024 Apr 24 17:50:21 Executing stop method ("kill $(cat /usr/local/ncpa/var/run/ncpa_listener.pid)"). ] kill: 12966: no such process [ 2024 Apr 24 17:50:21 Method "stop" exited with status 1. ] [ 2024 Apr 24 17:50:21 Executing stop method ("kill $(cat /usr/local/ncpa/var/run/ncpa_listener.pid)"). ] kill: 12966: no such process [ 2024 Apr 24 17:50:21 Method "stop" exited with status 1. ] [ 2024 Apr 24 17:50:36 Leaving maintenance because clear requested. ] [ 2024 Apr 24 17:50:36 Disabled. ] [ 2024 Apr 24 17:50:48 Enabled. ] [ 2024 Apr 24 17:50:48 Executing start method ("/usr/local/ncpa/ncpa_listener"). ] [ 2024 Apr 24 17:50:49 Method "start" exited with status 0. ] You have new mail in /var/mail/root root@serverhostname # root@serverhostname # /usr/local/ncpa/ncpa_listener --version ncpa_listener, version: 2.4.0

bish0polis commented 3 months ago

We went through the process of rebuilding, with the python27 installation, and that got us around that issue.

But then SRU69 comes and libffi is rug-pulled, so be wary.

amroll2627 commented 3 months ago

The workaround for us is to create script just to monitor the process by grep the process from ps and restart the service once the process missing..lazy solution but as long it works.