NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
176 stars 95 forks source link

NCPA systemctl unit test is restricted to units of type 'service' #1079

Open pittagurneyi opened 6 months ago

pittagurneyi commented 6 months ago

Hi,

I just ran into a problem and it seems that NCPA is the problematic part and not my Nagios service definition:

Recently libvirtd moved from monolithic libvirtd.service to modular and introduced lots of new sub-socket and sub-service components.

The problematic part here is that the new sub/modular virtqemud.service has a timeout of 120 seconds programmed into it, meaning that it stops itself, if no virtual machines are currently running, but can be triggered again by

, i.e. something or someone accessing it.

Now instead of monitoring the .service unit, I need to monitor one or more of the sockets instead. If active they are of state=listening. Otherwise, I'd only be able to monitor virtqemud.service on machines which always have virtual machines running, which is not guaranteed and I want to be certain that they can be started if necessary and that the infrastructure for that is active and working.

Problematic code:

https://github.com/NagiosEnterprises/ncpa/blob/master/agent/listener/services.py#L140C1-L140C1

If someone could please look into it and tell me how to resolve my issue, that would be great!

ne-bbahn commented 6 months ago

Have you tried altering --type=service to --type=socket?

I'd imagine you could very nearly just clone the services.py file and the services api endpoint, but changing a few lines (maybe just that one, I haven't looked that closely yet) to target sockets instead. It may make more sense to just add a conditional that determines what it should look at. I'll take a look soon.

I will put this on the roadmap for NCPA 3.0.2/3.1.0

pittagurneyi commented 6 months ago

Hi, thanks for looking into this.

No, I haven't tried to do any modifications on the source code. I've just finished installing NCPA 3 on most of my machines and adding the necessary Nagios checks. A few are still missing and I'll get to them shortly, but it's been a lot of work and I don't have any more time to invest in this task at the moment.

For the moment that is good enough for me and I'll wait till it is in the next minor/major NCPA release and fix it then.