Log HTTP/RPC message processing stats

yhabteab commented 2 months ago

This PR provides an alternative implementation of #10083, but without having to introduce a new log class, and with much more contextual log details.

3749756 - In addition to the existing HTTP request status logs, this commit adds the elapsed time of waiting for a CPU semaphore.
ec79ae6 - This commit basically does the same thing as the commit above, but for JSON-RPC connections.

closes #10083

yhabteab commented 2 months ago

Didn't change anything, just rebased!

julianbrost commented 2 months ago

For completeness, as there is no actual cross-reference between these two PRs yet: #10140 is a soft dependency of this PR. The purpose of this PR is to provide logging for all places where CpuBoundWork could introduce unexpected delays but doesn't add logging for all instances currently in the code where the remaining ones are to be removed in #10140.

julianbrost commented 1 month ago

Are you planning (@yhabteab) / expecting (@Al2Klimov) further changes to this PR regarding the unresolved conversations? I saw that there was a bit of interaction in #10167 but does that mean you want to use it here?

yhabteab commented 1 month ago

I saw that there was a bit of interaction in #10167 but does that mean you want to use it here?

Honestly, I have concerns about this approach; it appears rather random. Once #9990 is merged, it could become even more chaotic. If you, @julianbrost, believe we should use #10167, we can proceed with it, but I think it's unnecessary since we don't require that level of precision to determine whether it has blocked for a long time.

Are you planning (@yhabteab)

I've just rephrased the log message with the diagnostic error messages.

julianbrost commented 1 month ago

There were a few things I didn't like regarding the changes in JsonRpcConnection::HandleIncomingMessages() which made me play around with the code a bit.

Control flow: in the error case (catch (const std::exception& ex)) the overall control flow is a bit convoluted in that it leaves the exception handler, does common logging code, and then resumes the error handling based on the value of a string variable. That probably works fine[^1], but I'd say it's easier to follow if all the error happens in the exception handler. Therefor, I've tried to keep the control flow as is with the logging now happening inside the exception handler.
Log messages: while avoiding duplication of the logging code sounds like a good idea, this seems to have resulted in a rewording of the existing "Error while processing [...]" message. With the new wording of that message, you have to read farther into the message until the reason why this is now logged as a warning becomes clear. I've tried to improve this by having separate and more clear wording in the beginning of the log message stating the reason for this message.

The current state of my playing around can be seen in commit 5de11a4c23b000b55ebf14f578223d6d815d48e9. The main change is that there are now three distinct places where log messages can be generated. While this duplicates some code[^2], it doesn't even make the code longer overall. Apart from that, this commit also contains a few other minor tweak. Please have a look at this and let me know what you think.

[^1]: However, it's based on the assumption that DiagnosticInformation(ex) must never return an empty string. So if that assumption isn't needed, avoiding it doesn't sound too bad to me. [^2]: I actually have an intermediate version where I tried to keep the common logging code with case distinctions for error/slow/other logging and place it in a Defer so that the break; can be kept inside catch. But that turned out to be quite messy.

yhabteab commented 1 month ago

With the new wording of that message, you have to read farther into the message until the reason why this is now logged as a warning becomes clear.

How does it make any difference when the Error keyword appears in the log entry? you have to read further into the message ... sounds to me as if you would otherwise not read the rest of the log messages.

I've tried to improve this by having separate and more clear wording in the beginning of the log message stating the reason for this message.

And why exactly do we need this separation? Especially, why do you need to produce a different log entry based on the used log severity?

julianbrost commented 1 month ago

Example of a log message from the existing code:

warning/JsonRpcConnection: Error while processing JSON-RPC message for identity 'some-endpoint': Something, something, problem

Example of a log message that your PR would generate:

warning/JsonRpcConnection: Processing JSON-RPC 'event::CheckResult' message for identity 'some-endpoint' took total 42ms: Error: Something, something, problem

Did you change the log message because you think it's an improvement to the log message or because it allowed you to combine the different log messages into basically the same with just minor differences for the different situations?

With the new wording of that message, you have to read farther into the message until the reason why this is now logged as a warning becomes clear.

How does it make any difference when the Error keyword appears in the log entry? you have to read further into the message ... sounds to me as if you would otherwise not read the rest of the log messages.

I think it's just nicer if the relevant part gets mentioned first. And in case of an error happening, that's that, not how long the processing took.

I've tried to improve this by having separate and more clear wording in the beginning of the log message stating the reason for this message.

And why exactly do we need this separation? Especially, why do you need to produce a different log entry based on the used log severity?

They are logged in the same context, but the situation that's warned about is different, so different messages for "an error happened" and "this was surprisingly slow" don't sound like a bad idea to me. We don't need this but we might want better log messages and I tried to make a suggestion in that direction.

yhabteab commented 1 month ago

They are logged in the same context, but the situation that's warned about is different, so different messages for "an error happened" and "this was surprisingly slow" don't sound like a bad idea to me.

You are going to see either one of those at a time, the debug or the warning message, but never both entries at the same time, so separating them makes little sense to me. Apart from that, it shouldn't make any difference in the end result whether you prefix the message with Slow or not.

yhabteab commented 1 month ago

Did you change the log message because you think it's an improvement to the log message or because it allowed you to combine the different log messages into basically the same with just minor differences for the different situations?

No, not as an improvement, but I just wanted to perform one such thing rather than x times that are actually not different from one another.

Icinga / icinga2

Log HTTP/RPC message processing stats #10141