Return UNKNOWN on SSL-layer connection problems

rpv-tomsk commented 5 years ago

All SSL-layer problems should be reported as UNKNOWN instead of CRITICAL.

Plugin should behave consistently and report CRITICAL only when explicitly got from remote side or remote side is completely down (then -u, --unknown-timeout option comes to play).

All other cases, when remote side is up, but we unable to get correct response - should return UNKNOWN, like it was implemented for non-SSL connections.

For example, when SSL certificate expires and contact should receive only CRITICAL notifications, not UNKNOWN - such feature would help.

rpv-tomsk commented 5 years ago

STATE_UNKNOWN needs to be reported unconditionally, like in other places/cases:

send_request() {
    ....
    if (rc == -1) {
        printf("CHECK_NRPE: Error sending query to host.\n");
        close(sd);
        return STATE_UNKNOWN;
    }
    ...
}

read_response() {
   ...
    /* recv() error */
    if (rc < 0) {
        ...
        return STATE_UNKNOWN;
    } else if (rc == 0) {
        /* server disconnected */
        printf("CHECK_NRPE: Received 0 bytes from daemon.  Check the remote server logs for error messages.\n");
        ...
        return STATE_UNKNOWN;
    }
    ...
    if (packet_crc32 != calculated_crc32) {
        printf("CHECK_NRPE: Response packet had invalid CRC32.\n");
        ...
        return STATE_UNKNOWN;
    }
    ...
}

ericloyd commented 5 years ago

I disagree with your opening assumption that "all SSL-layer problems should be reported as UNKNOWN instead of CRITICAL."

If I've specified that I want SSL connectivity, then I want lack of SSL connectivity to be reported as a CRITICAL, not an UNKNOWN. UNKNOWN says that the status of the check is not known. CRITICAL means that the check is returning a status that warrants immediate attention.

Obviously, people may disagree on what an invalid SSL connection should return, so I thereby propose that instead of hard coding it, it should be an option as to what it returns and a default value of CRITICAL, to maintain functionality with the installed base thus far.

I'm not saying that you're wrong, but I am saying that there needs to be a broader consideration of what CRITICAL vs UNKNOWN means, and give people the option of using whatever works best for their use scenarios.

rpv-tomsk commented 5 years ago

then I want lack of SSL connectivity to be reported as a CRITICAL, not an UNKNOWN.

Then, highly possible, you want following to be reported as CRITICAL too:

CHECK_NRPE: Error sending query to host.
recv() error
CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.
CHECK_NRPE: Response packet had invalid CRC32.

I've specified, I want connectivity, then I want lack of connectivity to be reported as CRITCIAL, not an UNKNOWN. UNKNOWN says that the status of the check is not known. CRITICAL means that the check is returning a status that warrants immediate attention.

Why not?

rpv-tomsk commented 5 years ago

Obviously, people may disagree on what an invalid SSL connection should return, so I thereby propose that instead of hard coding it,

SSL is the only layer over transport layer. I don't understand why one layer should behave different from another.

Ok, are you the person who decides to merge such PR? If so, when let's decide about option name.

ericloyd commented 5 years ago

I don't decided anything. :-) I'm just offering an alternative viewpoint. My point is this - don't hardcode what SSL failure means, make it an option. But you have to leave the current behavior as it is or else anyone that is currently relying on SSL failure to generate a CRITICAL will be very surprised when their notifications or event handlers don't work because it's been changed to UNKNOWN.

rpv-tomsk commented 5 years ago

I just want to point out that your viewpoint is inconsistent.

If for some reasons, you catch one of these:

CHECK_NRPE: Error sending query to host.
recv() error
CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.
CHECK_NRPE: Response packet had invalid CRC32.

plugin will return STATE_UNKNOWN, while, based on your viewpoint, you expect to get STATE_CRITICAL on any error. And this is not configurable and hardcoded value too.

I think, plugin should behave consistently on all errors and between layers. Now (w/o this patch) it is not consistent.

ericloyd commented 5 years ago

I'm not intentionally being inconsistent, but I am intentionally being difficult. Playing the "Devil's Advocate."

I agree that plugins SHOULD be consistent. But they're not. Anything which changes the current behavior of plugins need to be well thought out. Here is my bottom line: If, today, SSL failure generates a CRITICAL, then it should continue to do so unless a new option which allows the user to specify what the value should be is added to the code. You can add other connection level events into that category if you wish, since we're talking about all SSL layer connection events.

rpv-tomsk commented 5 years ago

Ok, If any person, who can be responsible on merge, confirms his interest in this change, I can continue to work on it. Just decide and write, which solution we need here, which option name would be good, etc. Or simply merge this as-is, that is also acceptable (by me ; -) ).

I'm using own repo for my installations, so just wanted to inform community about issue exists.

sawolf commented 4 years ago

Hi @rpv-tomsk, thanks for the contribution. For what it's worth, I agree with you that SSL-layer problems should report UNKNOWN.

However, this project is very much in a maintenance-only state, and we're not looking to make changes/add features unless the software is causing considerable issues for existing users.

rpv-tomsk commented 4 years ago

Maybe this can be applied and go into 4.x branch??

sawolf commented 4 years ago

My apologies, I forgot about this PR when I did the 4.0 release.

I'm going to re-open the PR. If you change the code so that SSL-layer problems can optionally return UNKNOWN (only if -u is specified or similar), then I'll merge it in and we can release it the next time we do security fixes for NRPE.

rpv-tomsk commented 4 years ago

Hi, I'm interested and will try to implement that. Please keep me in the loop, in case I'm delay .)

rpv-tomsk commented 4 years ago

If you change the code so that SSL-layer problems can optionally return UNKNOWN (only if -u is specified or similar), then I'll merge it in and we can release it the next time we do security fixes for NRPE.

Hi,

As requested, changed code to return UNKNOWN only if -u / --unknown-timeout is specified.

Is this now good enough?

Thanks.

sawolf commented 2 years ago

Yep, I'm happy with this. Thanks!

NagiosEnterprises / nrpe

Return UNKNOWN on SSL-layer connection problems #212