Closed ramseydave closed 3 months ago
Same failure mode for me after some automated reboots in our environment for CentOS 7 and CentOS 8 Stream hosts.
parent INFO Daemon - start() - Initialize and run the daemon
ncpa[1053]: 2023-12-07 04:01:46,463 parent WARNING Daemon - check_pid() - Another instance is already running (pid 1047)
ncpa[1053]: Daemon - check_pid() - Another instance is already running (pid 1047)
ncpa[1053]: ***** Starting NCPA version: 3.0.0
systemd[1]: ncpa.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: ncpa.service: Failed with result 'exit-code'.
Did any of you modify your ncpa.service to use --start
instead of the default -n
?
Facing same issue, but on SLES 15.
The unit was not changed from the default -n
argument.
ncpa.service - NCPA
Loaded: loaded (/usr/lib/systemd/system/ncpa.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2024-01-19 08:55:44 UTC; 6h ago
Docs: https://www.nagios.org/ncpa
Process: 1493 ExecStart=/usr/local/ncpa/ncpa -n (code=exited, status=1/FAILURE)
Main PID: 1493 (code=exited, status=1/FAILURE)
Jan 19 08:55:44 host ncpa[1493]: 2024-01-19 08:55:44,649 root INFO main - Python version: 3.11.6 (main, Nov 20 2023, 07:42:02) [GCC 10.2.1 20210130 (Red Hat 10.2.1-11)]
Jan 19 08:55:44 host ncpa[1493]: 2024-01-19 08:55:44,651 root INFO main - SSL version: OpenSSL 3.0.8 7 Feb 2023
Jan 19 08:55:44 host ncpa[1493]: 2024-01-19 08:55:44,652 root INFO main - ZLIB version: 1.3
Jan 19 08:55:44 host ncpa[1493]: 2024-01-19 08:55:44,652 parent INFO Daemon - start() - Initialize and run the daemon
Jan 19 08:55:44 host ncpa[1493]: 2024-01-19 08:55:44,657 parent WARNING Daemon - check_pid() - Another instance is already running (pid 1493)
But I've set an override to specify the PID file created by NCPA.
host:~ # cat /usr/lib/systemd/system/ncpa.service
[Unit]
Description=NCPA
Documentation=https://www.nagios.org/ncpa
After=network.target local-fs.target
[Service]
ExecStart=/usr/local/ncpa/ncpa -n
[Install]
WantedBy=multi-user.target
host:~ # cat /etc/systemd/system/ncpa.service.d/override.conf
[Service]
PIDFile=/usr/local/ncpa/var/run/ncpa.pid
Haven't yet rebooted the server to confirm if it starts OK after this or not.
Thanks Soxfor. I just tested this on my Fedora 39 box. It fixed the ncpa starting problem for me, I can now reboot and ncpa actually starts up correctly. Thanks for that work around :)
Edit: after thinking about this a while, I can't confirm this is a fix actually. This problem seems to be infrequent. Most of the time everything works fine. Maybe some clean up isn't happening every time at shutdown or something?
@GldRush98 no problem. As a workaround it works, server booted and no error on NCPA service. Confirmed on my side as well.
This could be an error on the NCPA stop/start steps, so although this isn't a fix per-say it does provide a way of having a successful service start in the meantime.
My understanding is that NCPA fails to start when the PID file is present and the PID in it corresponds to an existing process. If the file exists but the PID in the file is not in use, the process starts fine. This explains why most of the time the service starts up fine, but sometimes it doesn't.
Adding this to /usr/lib/systemd/system/ncpa.service seems to fix the issue: ExecStop=/usr/local/ncpa/ncpa --stop
In some cases, after a reboot of a RHEL server, the NCPA service is failed with the below error code;
systemctl status ncpa
● ncpa.service - NCPA Loaded: loaded (/usr/lib/systemd/system/ncpa.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Tue 2023-12-05 08:24:44 GMT; 1s ago ... ncpa[1321]: 2023-12-05 08:17:42,515 parent WARNING Daemon - check_pid() - Another instance is already running (pid 1299) ...
# As per below, another process is using the PID. So it seems that the PID file for NCPA is holding on to the PID and not refreshing on reboot. ps -ef | grep 1299 root 1299 1 0 11:32 ? 00:00:00 /usr/lib/systemd/systemd-logind