NagiosEnterprises / nrpe

NRPE Agent
GNU General Public License v2.0
257 stars 133 forks source link

DNS issue should return UNKNOWN or CRITICAL #268

Open tatref opened 1 year ago

tatref commented 1 year ago

Hi,

https://github.com/NagiosEnterprises/nrpe/blob/b226fe4175dc79c9a6d7994614e570b26ad0f0dc/src/utils.c#L156-L157

At the moment, a DNS issue returns a WARNING.

This should probably be either UNKNOWN or CRITICAL.

Also, using check_nrpe for a host on Nagios will return an OK instead of WARNING, which can be problematic

I know that this project is not longer developped, can this still be fixed? I can make the PR

Thanks

ericloyd commented 1 year ago

I argue that it should not be considered CRITICAL, and UNKNOWN is not really the case - it is known that it is not resolvable, so it is not unknown. In essence, I believe that WARNING remains the proper response.

If you want to check for proper DNS resolution, you should be using the _checkdns plugin outside of NRPE. To be super pedantic, you could make a dependency that requires the check_dns result to be in an OK state before using a FQDN in an NRPE-based check, so that the NRPE check doesn't execute unless DNS is responding properly.

In short - my vote would be to leave it as is.

tatref commented 1 year ago

Thanks for your feedback!

Well, the goal of the command is to check for something on the remote host, so the result of the command could be UNKNOWN, because it didn't even get to execute the command, so in essence, the result os not KNOWN

I know I can add a check_dns, but adding this to every host is going to cumbersome. Moreover, maybe I have an /etc/hosts entry for this host, so check_dns is not necessarily the way to go.

ericloyd commented 1 year ago

I still believe that it is not a CRITICAL condition for whatever is being checked. It is a failure in NRPE's ability to connect, and there are ways to ensure that it can connect before executing the check. I named one.

And if you're using /etc/hosts, then DNS failure isn't an actual option here, is it? It's basic connectivity issues, which should then issue an UNKNOWN. But not a CRITICAL.

ericloyd commented 1 year ago

By the way, service dependencies are "smart" in that, if configured properly, you don't need to specific them for all hosts. You leave that part blank, make the master service your check_dns and your dependent service your check_nrpe (with no commands, just connectivity checking). That way, DNS must be working for check_nrpe to work. Then you make all your NRPE-based checks dependent upon check_nrpe (with no commands) so that they only run if NRPE is working. It's two dependencies.

ericloyd commented 1 year ago

So if we've narrow the code snippet in question down to being a DNS issue, as opposed to a general connectivity issue, then it definitely shouldn't be returning CRITICAL, and in this case, I'll agree that UNKNOWN would be more appropriate than WARNING.

tatref commented 5 months ago

Also note that failing to connect to port 5666 results in CRITICAL:

Critical : (No output on stdout) stderr: connect to address 1.2.3.4 port 5666: Connection refused