Open yhabteab opened 2 months ago
Didn't change anything, just rebased!
For completeness, as there is no actual cross-reference between these two PRs yet: #10140 is a soft dependency of this PR. The purpose of this PR is to provide logging for all places where CpuBoundWork
could introduce unexpected delays but doesn't add logging for all instances currently in the code where the remaining ones are to be removed in #10140.
Are you planning (@yhabteab) / expecting (@Al2Klimov) further changes to this PR regarding the unresolved conversations? I saw that there was a bit of interaction in #10167 but does that mean you want to use it here?
I saw that there was a bit of interaction in #10167 but does that mean you want to use it here?
Honestly, I have concerns about this approach; it appears rather random. Once #9990 is merged, it could become even more chaotic. If you, @julianbrost, believe we should use #10167, we can proceed with it, but I think it's unnecessary since we don't require that level of precision to determine whether it has blocked for a long time.
Are you planning (@yhabteab)
I've just rephrased the log message with the diagnostic error messages.
There were a few things I didn't like regarding the changes in JsonRpcConnection::HandleIncomingMessages()
which made me play around with the code a bit.
catch (const std::exception& ex)
) the overall control flow is a bit convoluted in that it leaves the exception handler, does common logging code, and then resumes the error handling based on the value of a string variable. That probably works fine[^1], but I'd say it's easier to follow if all the error happens in the exception handler. Therefor, I've tried to keep the control flow as is with the logging now happening inside the exception handler.The current state of my playing around can be seen in commit 5de11a4c23b000b55ebf14f578223d6d815d48e9. The main change is that there are now three distinct places where log messages can be generated. While this duplicates some code[^2], it doesn't even make the code longer overall. Apart from that, this commit also contains a few other minor tweak. Please have a look at this and let me know what you think.
[^1]: However, it's based on the assumption that DiagnosticInformation(ex)
must never return an empty string. So if that assumption isn't needed, avoiding it doesn't sound too bad to me.
[^2]: I actually have an intermediate version where I tried to keep the common logging code with case distinctions for error/slow/other logging and place it in a Defer
so that the break;
can be kept inside catch
. But that turned out to be quite messy.
- With the new wording of that message, you have to read farther into the message until the reason why this is now logged as a warning becomes clear.
How does it make any difference when the Error
keyword appears in the log entry? you have to read further into the message ... sounds to me as if you would otherwise not read the rest of the log messages.
- I've tried to improve this by having separate and more clear wording in the beginning of the log message stating the reason for this message.
And why exactly do we need this separation? Especially, why do you need to produce a different log entry based on the used log severity?
Example of a log message from the existing code:
warning/JsonRpcConnection: Error while processing JSON-RPC message for identity 'some-endpoint': Something, something, problem
Example of a log message that your PR would generate:
warning/JsonRpcConnection: Processing JSON-RPC 'event::CheckResult' message for identity 'some-endpoint' took total 42ms: Error: Something, something, problem
Did you change the log message because you think it's an improvement to the log message or because it allowed you to combine the different log messages into basically the same with just minor differences for the different situations?
- With the new wording of that message, you have to read farther into the message until the reason why this is now logged as a warning becomes clear.
How does it make any difference when the
Error
keyword appears in the log entry? you have to read further into the message ... sounds to me as if you would otherwise not read the rest of the log messages.
I think it's just nicer if the relevant part gets mentioned first. And in case of an error happening, that's that, not how long the processing took.
- I've tried to improve this by having separate and more clear wording in the beginning of the log message stating the reason for this message.
And why exactly do we need this separation? Especially, why do you need to produce a different log entry based on the used log severity?
They are logged in the same context, but the situation that's warned about is different, so different messages for "an error happened" and "this was surprisingly slow" don't sound like a bad idea to me. We don't need this but we might want better log messages and I tried to make a suggestion in that direction.
They are logged in the same context, but the situation that's warned about is different, so different messages for "an error happened" and "this was surprisingly slow" don't sound like a bad idea to me.
You are going to see either one of those at a time, the debug or the warning message, but never both entries at the same time, so separating them makes little sense to me. Apart from that, it shouldn't make any difference in the end result whether you prefix the message with Slow
or not.
Did you change the log message because you think it's an improvement to the log message or because it allowed you to combine the different log messages into basically the same with just minor differences for the different situations?
No, not as an improvement, but I just wanted to perform one such thing rather than x times that are actually not different from one another.
This PR provides an alternative implementation of #10083, but without having to introduce a new log class, and with much more contextual log details.
closes #10083