Closed eadred closed 5 years ago
@eadred Thank you for the contribution!
This plugin is designed to work with Stackdriver Error Reporting, which should recognize the resulting multiline message as a stack trace. I'm verifying this with the team right now whether the backend would recognize a nested C# exception if the stack trace were joined into a single log entry. If the backend won't recognize these, there's no point in handling them in the plugin, and we should file an Error Reporting feature request.
I've confirmed with the Error Reporting team that all 3 new exceptions were picked up by Error Reporting, but maybe not in the form you'd expect. For example, for the CSHARP_AGGREGATE_EXC
value from the test, only the content between the arrows was captured.
For the record, a self-service test can be done by running:
echo "System.AggregateException: One or more errors occurred. (This is an exception) ---> System.InvalidOperationException: This is an exception
at ExampleApp.AsyncExceptionExample.LowestLevelMethod() in c:/ExampleApp/ExampleApp/AsyncExceptionExample.cs:line 28
at ExampleApp.AsyncExceptionExample.ThirdLevelMethod() in c:/ExampleApp/ExampleApp/AsyncExceptionExample.cs:line 23
at ExampleApp.AsyncExceptionExample.SecondLevelMethod() in c:/ExampleApp/ExampleApp/AsyncExceptionExample.cs:line 17
at ExampleApp.AsyncExceptionExample.TopLevelMethod() in c:/ExampleApp/ExampleApp/AsyncExceptionExample.cs:line 12
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Task.Wait()
at ExampleApp.Program.Main(String[] args) in c:/ExampleApp/ExampleApp/Program.cs:line 12
---> (Inner Exception #0) System.InvalidOperationException: This is an exception
at ExampleApp.AsyncExceptionExample.LowestLevelMethod() in c:/ExampleApp/ExampleApp/AsyncExceptionExample.cs:line 28
at ExampleApp.AsyncExceptionExample.ThirdLevelMethod() in c:/ExampleApp/ExampleApp/AsyncExceptionExample.cs:line 23
at ExampleApp.AsyncExceptionExample.SecondLevelMethod() in c:/ExampleApp/ExampleApp/AsyncExceptionExample.cs:line 17
at ExampleApp.AsyncExceptionExample.TopLevelMethod() in c:/ExampleApp/ExampleApp/AsyncExceptionExample.cs:line 12<---
" | \
gcloud beta error-reporting events report \
--service manual \
--service-version test-version \
--project=$(gcloud config get-value project) \
--message-file=/dev/stdin
(see the gcloud docs).
If the ingestion results don't look ok, let's open a feature request against Error Reporting before proceeding with this PR. For the values that work correctly, we can start reviewing those portions of the PR.
Sorry about the force push. I've now restored the original commit as it was at the time of the first review (https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions/commit/1df893b5db924cbfa987fe44e1719979e70125da).
I've now managed to get the C# rules into a separate set of rules (in the most recent commit https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions/commit/a991faaa09c74bb19a40bb6cb9a377bc021441ea). As I mentioned in an earlier comment, I've now had to cope with the fact that it is ambiguous as far as the rules are concerned whether you are seeing a C# or a Java exception. To implement this I've had to support the state machine having more than one possible current state. For example, it might be in either the :java_start_exception
or the :csharp_start_exception
state when it see a line like "foo.bar.baz.Exception: Something went wrong."
Unfortunately these changes have made the bench tests around 3 to 4 times slower because the ExceptionDetector.transition method can't exit early once it has found a match. I'm not sure there's much I can do about this - is it preferable to have the cleaner separation of rules with worse performance, or to go back to the mixing of C# and Java-like rules?
I checked how the new C# exceptions looked in error reporting, and it seems it is only the aggregate exception that doesn't get picked up properly. It is actually the other two types of exception I am more interested in getting handled properly, so if it meant this PR could get merged faster I could drop the logic for handling the aggregate exception. What do you think?
@eadred - Thanks for refactoring the code. Now I see why it was challenging to separate csharp
and java
states. We care a lot of performance, so 3X - 4X processing time is sufficient to make an exception for this case. We probably want to revert the https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions/commit/a991faaa09c74bb19a40bb6cb9a377bc021441ea commit.
As to the aggregate
exception, let's drop that part (please open a feature request with Error Reporting backend if needed) so that we can get this PR in.
Thanks for that.
I've now reverted https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions/commit/a991faaa09c74bb19a40bb6cb9a377bc021441ea (as https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions/pull/58/commits/7fa27c0d842d804a7fd81724a2916389fcaacca5) and also removed the handling for aggregate exceptions (https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions/pull/58/commits/58889d353f3f7e92866d053dda31000f0353a1b2).
I've raised an issue with the Error Reporting team for handling AggregateExceptions - https://issuetracker.google.com/issues/139292720.
PS. I realised that my handling of aggregate exceptions wasn't dealing with the case where there are multiple inner exceptions, so I did put in a fix for that (https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions/pull/58/commits/6ea0cb7f6d935668782073d3bd4770c24c3ddcc2). Although this has all been reverted anyway, I may want to re-add these changes in a future PR.
Thanks @eadred! This looks good to me. Seems like there are still pending comments from @igorpeshansky. He is on vacation and will be back on Monday (Aug 19). Is this something that can wait or is it urgent?
Thanks @qingling128. No problem if this is delayed - it isn't urgent.
Cool, thanks. Will wait for @igorpeshansky to take a second look once he's back.
@igorpeshansky Could you release a new version for this issue? Now I need this enhancement,I use .net core running in docker swarm,then collect log from docker logging driver to fluentd which add this plugin for multiline error logs.
I'll work with our oncall @davidbtucker to do a release.