Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

Log HTTP/RPC message processing stats #10141

Open yhabteab opened 2 months ago

yhabteab commented 2 months ago

This PR provides an alternative implementation of #10083, but without having to introduce a new log class, and with much more contextual log details.

closes #10083

yhabteab commented 2 months ago

Didn't change anything, just rebased!

julianbrost commented 2 months ago

For completeness, as there is no actual cross-reference between these two PRs yet: #10140 is a soft dependency of this PR. The purpose of this PR is to provide logging for all places where CpuBoundWork could introduce unexpected delays but doesn't add logging for all instances currently in the code where the remaining ones are to be removed in #10140.

julianbrost commented 1 month ago

Are you planning (@yhabteab) / expecting (@Al2Klimov) further changes to this PR regarding the unresolved conversations? I saw that there was a bit of interaction in #10167 but does that mean you want to use it here?

yhabteab commented 1 month ago

I saw that there was a bit of interaction in #10167 but does that mean you want to use it here?

Honestly, I have concerns about this approach; it appears rather random. Once #9990 is merged, it could become even more chaotic. If you, @julianbrost, believe we should use #10167, we can proceed with it, but I think it's unnecessary since we don't require that level of precision to determine whether it has blocked for a long time.

Are you planning (@yhabteab)

I've just rephrased the log message with the diagnostic error messages.

julianbrost commented 1 month ago

There were a few things I didn't like regarding the changes in JsonRpcConnection::HandleIncomingMessages() which made me play around with the code a bit.

The current state of my playing around can be seen in commit 5de11a4c23b000b55ebf14f578223d6d815d48e9. The main change is that there are now three distinct places where log messages can be generated. While this duplicates some code[^2], it doesn't even make the code longer overall. Apart from that, this commit also contains a few other minor tweak. Please have a look at this and let me know what you think.

[^1]: However, it's based on the assumption that DiagnosticInformation(ex) must never return an empty string. So if that assumption isn't needed, avoiding it doesn't sound too bad to me. [^2]: I actually have an intermediate version where I tried to keep the common logging code with case distinctions for error/slow/other logging and place it in a Defer so that the break; can be kept inside catch. But that turned out to be quite messy.

yhabteab commented 1 month ago
  • With the new wording of that message, you have to read farther into the message until the reason why this is now logged as a warning becomes clear.

How does it make any difference when the Error keyword appears in the log entry? you have to read further into the message ... sounds to me as if you would otherwise not read the rest of the log messages.

  • I've tried to improve this by having separate and more clear wording in the beginning of the log message stating the reason for this message.

And why exactly do we need this separation? Especially, why do you need to produce a different log entry based on the used log severity?

julianbrost commented 1 month ago

Example of a log message from the existing code:

warning/JsonRpcConnection: Error while processing JSON-RPC message for identity 'some-endpoint': Something, something, problem

Example of a log message that your PR would generate:

warning/JsonRpcConnection: Processing JSON-RPC 'event::CheckResult' message for identity 'some-endpoint' took total 42ms: Error: Something, something, problem

Did you change the log message because you think it's an improvement to the log message or because it allowed you to combine the different log messages into basically the same with just minor differences for the different situations?

  • With the new wording of that message, you have to read farther into the message until the reason why this is now logged as a warning becomes clear.

How does it make any difference when the Error keyword appears in the log entry? you have to read further into the message ... sounds to me as if you would otherwise not read the rest of the log messages.

I think it's just nicer if the relevant part gets mentioned first. And in case of an error happening, that's that, not how long the processing took.

  • I've tried to improve this by having separate and more clear wording in the beginning of the log message stating the reason for this message.

And why exactly do we need this separation? Especially, why do you need to produce a different log entry based on the used log severity?

They are logged in the same context, but the situation that's warned about is different, so different messages for "an error happened" and "this was surprisingly slow" don't sound like a bad idea to me. We don't need this but we might want better log messages and I tried to make a suggestion in that direction.

yhabteab commented 1 month ago

They are logged in the same context, but the situation that's warned about is different, so different messages for "an error happened" and "this was surprisingly slow" don't sound like a bad idea to me.

You are going to see either one of those at a time, the debug or the warning message, but never both entries at the same time, so separating them makes little sense to me. Apart from that, it shouldn't make any difference in the end result whether you prefix the message with Slow or not.

yhabteab commented 1 month ago

Did you change the log message because you think it's an improvement to the log message or because it allowed you to combine the different log messages into basically the same with just minor differences for the different situations?

No, not as an improvement, but I just wanted to perform one such thing rather than x times that are actually not different from one another.