getsentry / sentry-dotnet

Sentry SDK for .NET
https://docs.sentry.io/platforms/dotnet
MIT License
582 stars 207 forks source link

Baggage propagation doesn't seem to work when Exception captured in middleware #2441

Closed bruno-garcia closed 1 year ago

bruno-garcia commented 1 year ago

I'm updating NuGet Trends with the latest JavaScript (Angular) and .NET SDKs.

One of the things since the last update is the introduction of this feature: https://changelog.getsentry.com/announcements/connect-session-replays-to-backend-errors

Unfortunately it doesn't seem to work:

I throw an exception from my C# controller, but only the Frontend error shows on the Replay: image

Using the Replay Network details to capture the baggage header, I can confirm the JS SDK is sending it:

image

Value (from replay):

sentry-environment=local,sentry-transaction=%2F,sentry-public_key=85a592e835c64ca3a97d93776c12e947,sentry-trace_id=20b4f41dc98f49e2b25b5f88ffae1901,sentry-sample_rate=1,sentry-replay_id=8f9bf18820be4450b62868dc460be8d8

The exception captured by the .NET SDK included the envelope header:

[23:11:08 DBG] Envelope 'c0736a5e8f594176a65a9c0129b05cdd' successfully sent. Content: {"sdk":{"name":"sentry.dotnet","version":"3.33.1"},"event_id":"c0736a5e8f594176a65a9c0129b05cdd","sent_at":"2023-06-23T03:11:08.488511+00:00"}
{"type":"event","length":19132}
{"modules":{"System.Private.CoreLib":"7.0.0.0",.....

Note that replay_id isn't included.

Investigation

The PR that added support here is:

Note that we look for a Span from an exception to support the case where a transaction is closed when unwinding the stack due to an exception bubbling up. Which is the case in ASP.NET Core where the root middleware captures the exception but further in the call chain we start/stop transactions:

https://github.com/getsentry/sentry-dotnet/blob/060480e1dfb42ca2044a8445c1d8e71b84b23f73/src/Sentry/Internal/Hub.cs#L275-L290

But on this PR adding support for DSC on exceptions, we don't consider the lookup, and only expect the transaction on the scope:

https://github.com/getsentry/sentry-dotnet/blob/060480e1dfb42ca2044a8445c1d8e71b84b23f73/src/Sentry/Internal/Hub.cs#L163-L166

I tried to validate that by not throwing from the controller, but instead just calling CaptureMessage. With a transaction on the scope, thing should work. That wasn't the case.

The exception included the DSC details, but no replay_id:

[23:44:21 DBG] Envelope '60ce0fe4d9294f27acbb429d3a7c858f' successfully sent. Content: {"sdk":{"name":"sentry.dotnet","version":"3.33.1"},"event_id":"60ce0fe4d9294f27acbb429d3a7c858f","trace":{"trace_id":"0c6188d56e9e49ffba45e31e5b697b3c","public_key":"57331596a25b4c3da49750b292299e09","sample_rate":"1","release":"NuGetTrends.Web@1.0.0\u002Ba678a0d59db0bcef71578b7c341c10c48ac5c80f","environment":"development","transaction":"GET api/package/history/{id}"},"sent_at":"2023-06-23T03:44:21.916599+00:00"}
jamescrosswell commented 1 year ago

Quick update: Currently trying to get a javascript app that will send trace headers... then will try to reproduce this. Have tried with Svelte and React so far... will try with Angular I guess (I probably should have started there).

jamescrosswell commented 1 year ago

@bruno-garcia I haven't been able to reproduce this.

See this replay... which is a simple Angular app that makes two calls to a .NET Core app.

  1. One of the calls is to a .NET endpoint that responds with a 200 OK and some text in the body
  2. The other calls is to a .NET endpoint that throws an error

And this is the event details for the error that gets thrown in the .NET application.

image

image

There's a sentry-replay_id in the baggage header and a Sentry-Trace header present.

The app in question is a minor variant of the Sentry.Samples.AspNetCore.Basic sample in the .NET SDK, that has a CORS anything policy:

using Microsoft.AspNetCore;

namespace Sentry.Samples.AspNetCore.Basic;

public class Program
{
    public static void Main(string[] args)
    {
        BuildWebHost(args).Run();
    }

    private const string MyAllowSpecificOrigins = "_myAllowSpecificOrigins";

    public static IWebHost BuildWebHost(string[] args) =>
        WebHost.CreateDefaultBuilder(args)
            .ConfigureServices(services =>
                {
                    services.AddCors(options =>
                    {
                        options.AddPolicy(name: MyAllowSpecificOrigins,
                            policy  =>
                            {
                                policy.AllowAnyHeader();
                                policy.AllowAnyMethod();
                                policy.AllowAnyOrigin();
                            });
                    });
                }
            )
            .UseSentry(o =>
            {
                // A DSN is required.  You can set it here, or in configuration, or in an environment variable.
                o.Dsn = "https://b887218a80114d26a9b1a51c5f88e0b4@o447951.ingest.sentry.io/6601807";

                // Enable Sentry performance monitoring
                o.EnableTracing = true;

#if DEBUG
                // Log debug information about the Sentry SDK
                o.Debug = true;
#endif
            })

            // The App:
            .Configure(app =>
            {
                app.UseRouting();

                app.UseCors(MyAllowSpecificOrigins);

                // Enable Sentry performance monitoring
                app.UseSentryTracing();

                // An example ASP.NET Core middleware that throws an
                // exception when serving a request to path: /throw
                app.UseEndpoints(endpoints =>
                {
                    endpoints.MapGet("/hello", context => context.Response.WriteAsync("Hey bro!"));
                    // Reported events will be grouped by route pattern
                    endpoints.MapGet("/throw/{message?}", context =>
                    {
                        var exceptionMessage = context.GetRouteValue("message") as string;

                        var log = context.RequestServices.GetRequiredService<ILoggerFactory>()
                            .CreateLogger<Program>();

                        log.LogInformation("Handling some request...");

                        var hub = context.RequestServices.GetRequiredService<IHub>();
                        hub.ConfigureScope(s =>
                        {
                            // More data can be added to the scope like this:
                            s.SetTag("Sample", "ASP.NET Core"); // indexed by Sentry
                            s.SetExtra("Extra!", "Some extra information");
                        });

                        log.LogInformation("Logging info...");
                        log.LogWarning("Logging some warning!");

                        // The following exception will be captured by the SDK and the event
                        // will include the Log messages and any custom scope modifications
                        // as exemplified above.
                        throw new Exception(
                            exceptionMessage ?? "An exception thrown from the ASP.NET Core pipeline");
                    });
                });
            })
            .Build();
}
bruno-garcia commented 1 year ago

There's a sentry-replay_id in the baggage header and a Sentry-Trace header present.

@jamescrosswell The event you linked has a callout for Replay: image It should show the replay recorded in the frontend instead.

The replay_id in the baggage does load up in Sentry.

Is the .NET SDK sending these ids in the envelope header for errors and transactions? If so, Relay should be stitching the Replay to the event. And we would see a Replay preview instead of the CTA to install replays

jamescrosswell commented 1 year ago

Hm, I don't see the callout. I do see that there are 0 replays associated with that event though:

image

If that replay id was in the envelope header then we'd expect to see it here right?

It looks like the DSC gets put in a header for the event and when the DSC gets created from the baggage header currently it will contain the sentry-replay_id.

Our SentryMiddleware is correctly extracting the DSC (and the sentry-replay_id) from the baggage header but it gets lost by this extension method:

    internal static ITransaction StartTransaction(
        this IHub hub,
        ITransactionContext context,
        IReadOnlyDictionary<string, object?> customSamplingContext,
        DynamicSamplingContext? dynamicSamplingContext)
        => hub is Hub fullHub
            ? fullHub.StartTransaction(context, customSamplingContext, dynamicSamplingContext)
            : hub.StartTransaction(context, customSamplingContext);

The reason being that hub, when this code gets called, is a HubAdapter (not a fullHub).

I can work around that by creating another internal overload on HubAdapter.StartTransaction that accepts a DynamicSamplingContext parameter and reworking the above code a little:

    internal static ITransaction StartTransaction(
        this IHub hub,
        ITransactionContext context,
        IReadOnlyDictionary<string, object?> customSamplingContext,
        DynamicSamplingContext? dynamicSamplingContext) => hub switch
        {
            Hub fullHub => fullHub.StartTransaction(context, customSamplingContext, dynamicSamplingContext),
            HubAdapter adapter => adapter.StartTransaction(context, customSamplingContext, dynamicSamplingContext),
            _ => hub.StartTransaction(context, customSamplingContext)
        };

However this reveals another problem a couple of dominos further along in Hub.CaptureEventInternal. When an exception is thrown, both the hint and scope parameters getting passed in here are null and actualScope.Transaction is also null so DSC information doesn't get associated with the event.

jamescrosswell commented 1 year ago

OK, I think I've worked out what's going on. We have two bits of middleware: SentryMiddleware and SentryTracingMiddleware. When the later captures an exception, it finishes the transaction:

                else if (status == SpanStatus.Ok)
                {
                    transaction.Finish(exception);
                }
                else
                {
                    transaction.Finish(exception, status);
                }

Internally, that Finish method binds the transaction (which basically stores it in the Hub's internal ExceptionToSpanMap). SentryTracingMiddleware then throws the exception again.

Next the global exception handler in the SentryMiddleware gets activated. That creates an event from the exception and then calls hub.CaptureEvent... which eventually ultimately executes Hub.CaptureEventInternal.

At this point, since the transaction has finished, the current scope no longer has a transaction associated with it and so when we try to read the transaction off the scope and the DSC off the transaction, it's nulls all the way down.

            // When a transaction is present, copy its DSC to the event.
            var transaction = actualScope.Transaction as TransactionTracer;
            evt.DynamicSamplingContext = transaction?.DynamicSamplingContext;

Luckily, our SentryTracingMiddleware stored a map between the exception and the original span/transaction in the ExceptionToSpanMap and so we can get this information from there instead.

I'll put together a unit test and a fix.

jamescrosswell commented 1 year ago

k, I think that fixed it:

image