getsentry / sentry-dotnet

Sentry SDK for .NET
https://docs.sentry.io/platforms/dotnet
MIT License
606 stars 206 forks source link

Consider `Mechanism.Handled` for non-thrown captured exceptions #3383

Open bitsandfoxes opened 6 months ago

bitsandfoxes commented 6 months ago

The MainExceptionProcessor handles setting the mechanism here: https://github.com/getsentry/sentry-dotnet/blob/c30052e99d981aec3d6ff1bda42c3395a2376fe7/src/Sentry/Internal/MainExceptionProcessor.cs#L177-L188

So this code ends up with Handled = true

try
{
    throw new Exception();
}
catch (Exception ex)
{
    // will be marked as handled 
    SentrySdk.CaptureException(ex);
}

And this code ends up with Handled -- in the issue's Highlights

// will not be marked as anything
SentrySdk.CaptureException(new Exception());

Screenshot 2024-05-23 at 13 32 53

Interestingly, captured messages are Handled -- too.

Which, personally, I find confusing. If an exception is not unhandled, should we not consider it handled implicitly? I'm not talking technical correctness here but user expectations.

But I can still filter issues based on error.handled:false so maybe it doesn't matter?

bitsandfoxes commented 6 months ago

We should consider other scenarios where the stacktrace ends up being null. Just to make sure we're not falsely flagging handled exceptions as --.

ericsampson commented 5 months ago

it feels to me like the only entity who knows for sure if an exception sent to CaptureException is actually "handled" or not is the caller. So should this be controlled by an optional input parameter of that method? (I don't know how that aligns with the cross-language use of this type of Sentry method)

jamescrosswell commented 5 months ago

it feels to me like the only entity who knows for sure if an exception sent to CaptureException is actually "handled" or not is the caller. So should this be controlled by an optional input parameter of that method?

It's possible for the caller to set handled to true/false by setting the "mechanism" for the exception before capturing it... which is how the Sentry middleware sets this in the ASP.NET Core integration: https://github.com/getsentry/sentry-dotnet/blob/d1e5efcdf3af763ad49f11cd2426cc14a315a901/src/Sentry.AspNetCore/SentryMiddleware.cs#L219

It looks like the main exception processor would overwrite any value that was set explicitly via the mechanism though... which probably isn't a good idea. If someone has set this mechanism explicitly, we shouldn't be overwriting it right?

bitsandfoxes commented 4 months ago

If someone has set this mechanism explicitly, we shouldn't be overwriting it right?

Yeah, definitely not!

ericsampson commented 3 months ago

It's possible for the caller to set handled to true/false by setting the "mechanism" for the exception before capturing it via exception.SetSentryMechanism(mechanism, description, handled: false);

Is that even documented in the SDK docs site?

Either way, it just seems like it would be a lot user-friendlier and more discoverable to offer a new overload: CaptureException(Exception exception, ..., bool handled = false)

bitsandfoxes commented 3 months ago

If a user captures an exception via CaptureException isn't it handled implicitly?

jamescrosswell commented 2 months ago

If a user captures an exception via CaptureException isn't it handled implicitly?

That rings true... and more reliable than checking whether there is a stack trace or not.

ericsampson commented 2 months ago

If a user captures an exception via CaptureException isn't it handled implicitly?

@bitsandfoxes do you mean semantically? or are you talking about the implementation

bitsandfoxes commented 2 months ago

If a user calls CaptureExeception that event should be marked with handled: true. If an unhandled exception gets captured by the SDK without any user-code intervention that event should be marked with handled: false. So far so good. This issue descibes an issue with: An integration captures an exception, it gets processed by the SDK and the event processor (I think wrongly) assumes that it was handled based on the existence of a stacktrace. I think this is a very brittle and abstract reasoning.

A bit of a problem with the mechanism.handled key is that the SDK also relies on it to update the current session:

During capture, the but hub and client check whether the event was terminal to finish the currently active transaction and update the session.

https://github.com/getsentry/sentry-dotnet/blob/34cd01ca610ac3e6a4c3904b1d327931de0b8cee/src/Sentry/Internal/Hub.cs#L439-L445

https://github.com/getsentry/sentry-dotnet/blob/34cd01ca610ac3e6a4c3904b1d327931de0b8cee/src/Sentry/SentryClient.cs#L334-L346

https://github.com/getsentry/sentry-dotnet/blob/34cd01ca610ac3e6a4c3904b1d327931de0b8cee/src/Sentry/SentryEvent.cs#L183-L197

Now in the case of i.e. Unity where an unhandled exception does not crash the game the session still gets updated as crashed. See https://github.com/getsentry/sentry-unity/issues/1721

So ideally

jamescrosswell commented 2 months ago

I think all of that makes sense... just one question:

  • The unhandled is made to not be the qualifier whether a session was crashed

@bitsandfoxes what would be the alternative mechanism to determine this?

bitsandfoxes commented 2 months ago

what would be the alternative mechanism to determine this?

That I don't know. But maybe we can leverage one of the PosixSignals described in https://github.com/getsentry/sentry-dotnet/issues/1122?

jamescrosswell commented 2 months ago

maybe we can leverage one of the PosixSignals described in #1122?

🤔 I think those are only available on Unix derivatives... so we'd need some other mechanism for non *nix [as well].

ericsampson commented 2 months ago

@bitsandfoxes re this statement:

If a user calls CaptureExeception that event should be marked with handled: true.

IMO this is not correct. Only the code author that called CaptureException truly knows if their code handled that particular exception or not. Unless I just fundamentally misunderstand the purpose of this variable, which is possible... Can you correct my mental model if so?

In my mind there are two possible scenarios for situations where app code authors call CaptureException:

1) the most common case

catch ex
{
sentry.CaptureException(ex); // `handled` is default `false` if unspecified.
throw // we're in the weeds, abandon ship!
} 

2) the case where the exceptional state is actually corrected


catch ex
{
  // a bunch of code here to correct the exceptional situation and return the system state to correctness
  sentry.CaptureException(ex, handled: true);
}
// continue on with the remaining code as normal because the exceptional state has been fixed and we're nominal
jamescrosswell commented 2 months ago

In my mind there are two possible scenarios for situations where app code authors call CaptureException:

@ericsampson the comment I made on your PR relates to this.

I think my comment above was not very clear... it's not the middleware that I was worried about but this: https://github.com/getsentry/sentry-dotnet/blob/d1e5efcdf3af763ad49f11cd2426cc14a315a901/src/Sentry/Internal/MainExceptionProcessor.cs#L177-L188

I'm not 100% sure whether it's an issue or not. I think we'd need to confirm with some sample that demonstrated a problem before we try to fix it.

ericsampson commented 2 months ago

I'm very confused.

This issue descibes an issue with: An integration captures an exception, it gets processed by the SDK and the event processor (I think wrongly) assumes that it was handled based on the existence of a stacktrace. I think this is a very brittle and abstract reasoning.

But there is a key check already happening in the lines just before the ones that were posted previously in this thread. It's already first checking the value of handled before falling back to the other conditions: https://github.com/getsentry/sentry-dotnet/blob/c892db48962c9160777eb51f723fd206991ebc8c/src/Sentry/Internal/MainExceptionProcessor.cs#L171-L188

jamescrosswell commented 2 months ago

But there is a key check already happening in the lines just before the ones that were posted previously in this thread. It's already first checking the value of handled before falling back to the other conditions:

Indeed, and there's a comment there:

// The mechanism handled flag was set by an integration.

So basically, that code is preserving any mechanism information if it has already been set on the exception. That bit I think is fine...

Reading the code again though, my initial concern about it overriding the mechanism was unfounded. The if block simply checks to make sure exception.Data[Mechanism.HandledKey] isn't null - so if an SDK user sets this to true or to false it should be preserved.

The else statements seem a bit more random. If no mechanism has been set on the exception, the code seems to make assumptions about whether the exception was handled or not based on the presence/absence of a stack trace in the exception... All of which I think takes us back to what @bitsandfoxes originally summarised (quite well, and without all the confusion that I added - apologies for that).

ericsampson commented 2 months ago

All of which I think takes us back to what @bitsandfoxes https://github.com/getsentry/sentry-dotnet/issues/3383#issue-2312683216 (quite well, and without all the confusion that I added - apologies for that).

right, that's all I was trying to do was to get us back in sync with the correct state :)

I'd propose that the code be simplified to just the following: (note also that HandledKey value is allowed to be nullable! And it's also safer to not assume that it's guaranteed to be present)

if (exception.Data.Contains(Mechanism.HandledKey)) 
 { 
     // The mechanism handled flag was set by an integration (likely) or explicitly by the user.
     mechanism.Handled = e.Data[Mechanism.HandledKey] as bool?;
     exception.Data.Remove(Mechanism.HandledKey); 
 }
else
{
  // The mechanism handled flag was unset, which means that the exception wasn't handled by an integration.
  mechanism.Handled = true;
}
jamescrosswell commented 2 months ago

I'd propose that the code be simplified to just the following: (note also that HandledKey value is allowed to be nullable! And it's also safer to not assume that it's guaranteed to be present)

if (exception.Data.Contains(Mechanism.HandledKey)) 
 { 
     // The mechanism handled flag was set by an integration (likely) or explicitly by the user.
     mechanism.Handled = e.Data[Mechanism.HandledKey] as bool?;
     exception.Data.Remove(Mechanism.HandledKey); 
 }
else
{
  // The mechanism handled flag was unset, which means that the exception wasn't handled by an integration.
  mechanism.Handled = true;
}

So the only time we'd end up in the else branch there is if an SDK user (not an integration) had manually captured the exception without specifying whether it was handled or not. So we're saying user's expectations, in this case, would be for Handled to default to true?

@bitsandfoxes I think that's what you had in mind when you created this ticket right?

I'm OK with that. I think the only credible alternative would be if we defaulted Handled to null, reflecting the fact that it wasn't specified.

I'm fine with either option. Maybe worth adding a code comment to explain the reasoning, one way or the other.

jamescrosswell commented 2 months ago

right, that's all I was trying to do was to get us back in sync with the correct state :)

Much appreciated and thanks for bearing with us... working out what we're supposed to do is often over 50% of the work!

ericsampson commented 2 months ago

So we're saying user's expectations, in this case, would be for Handled to default to true?

I think it's more about matching the Sentry intent than end-user expectations, to maintain consistency across the various SDKs.

https://develop.sentry.dev/sdk/event-payloads/exception/#exception-mechanism

handled: Optional flag indicating whether the user has handled the exception (for example, via try ... catch).

This is probably the best source of documentation about the intent that I can find, which seems to support setting it to true in the else: https://getsentry.github.io/relay/relay_event_schema/protocol/struct.Mechanism.html#structfield.handled

handled: Annotated<bool> Flag indicating whether this exception was handled.

This is a best-effort guess at whether the exception was handled by user code or not. For example:

Exceptions leading to a 500 Internal Server Error or to a hard process crash are handled=false, as the SDK typically has an integration that automatically captures the error. Exceptions captured using capture_exception (called from user code) are handled=true as the user explicitly captured the exception (and therefore kind of handled it)

bitsandfoxes commented 2 months ago

This sounds right to me. Thanks a lot for sifting through our docs! I think, unless explicitly captured, exceptions should be marked as unhandled.