getsentry / sentry-dotnet

Sentry SDK for .NET
https://docs.sentry.io/platforms/dotnet
MIT License
599 stars 206 forks source link

Some errors not captured if out of performance units #3283

Closed bruno-garcia closed 1 month ago

bruno-garcia commented 7 months ago

A customer on ASP.NET Core said that when they are out of performance quota, some errors are not captured.

Looking at an event, it's going through this middleware:

image

That's here:

https://github.com/getsentry/sentry-dotnet/blob/480e50830b6ede50a5227028a55c3e2ee1c61090/src/Sentry.AspNetCore/SentryTracingMiddleware.cs#L139

Since we only capture the exception further down, I wonder if we're possibly throwing an exception somewhere here:

https://github.com/getsentry/sentry-dotnet/blob/480e50830b6ede50a5227028a55c3e2ee1c61090/src/Sentry.AspNetCore/SentryTracingMiddleware.cs#L149-L210

And not reaching this:

https://github.com/getsentry/sentry-dotnet/blob/480e50830b6ede50a5227028a55c3e2ee1c61090/src/Sentry.AspNetCore/SentryTracingMiddleware.cs#L213-L216

Otherwise I'm not sure why such error would not be captured unless they are not capturing transactions (or are out of quota).

The exceptions are captured by SentryMiddleware.UnhandledException so I might be looking at it wrong as SentryMiddleware was hidden:

image

https://github.com/getsentry/sentry-dotnet/blob/480e50830b6ede50a5227028a55c3e2ee1c61090/src/Sentry.AspNetCore/SentryMiddleware.cs#L164

akonyer commented 7 months ago

I am the one who is experiencing this issue. It seems very strange, but with no changes to the codebase at all, when we run out of transaction quota on our performance metrics, errors stop coming through. The interesting thing is that it's not ALL errors - just some. some still come through, but it is a massively reduced amount. It's like we're being throttled.

We have only used ~5% of our error reserve (15k/300k), but 100% of our performance reserve is used up.

I will get 500 errors reported from the front end (web application) and sent to sentry when an API crashes with a 500, but the back end does not sent the corresponding error to sentry.

I noticed that when I add more funds to my on-demand-budget to allow performance metrics to be captured again, then errors come through just fine. I keep having to add to this on demand reserve every day to get errors to come through until our month resets in a couple of days.

It does seem like a bug in the C# library that is somehow dropping errors when the performance logs are also dropped. And when the performance logs come through just fine the error logs do as well.

This is my config for the .net backend apps:

        builder.UseSentry(config =>
        {
            config.Dsn = dsn;
            config.Environment = serverName;
            config.MaxRequestBodySize = RequestSize.Medium;
            config.SendDefaultPii = true;
            config.IncludeActivityData = true;
            config.ServerName = serviceName;
            config.TracesSampleRate = tracesSampleRate;

            config.TracesSampler = tracesSampler;

            if (openTelemetryEnabled)
            {
                config.UseOpenTelemetry();
            }
        });
jamescrosswell commented 7 months ago

@bruno-garcia how/where do quotas affect the client SDKs? Is it done via Rate Limiting in the transport or is there some other mechanism?

bruno-garcia commented 7 months ago

@bruno-garcia how/where do quotas affect the client SDKs? Is it done via Rate Limiting in the transport or is there some other mechanism?

yeah only through X-Sentry-Rate-Limit, afaik.

@akonyer by any chance are you using openTelemetryEnabled ? Wondering if that can help narrow down.

Also, are the errors just application related? Or they seem not related to your code?

akonyer commented 7 months ago

No, we currently have Open Telemetry flagged off.

The errors are application code, not framework level or anything. Just crashes/exceptions in our code.

jamescrosswell commented 7 months ago

The interesting thing is that it's not ALL errors - just some. some still come through, but it is a massively reduced amount. It's like we're being throttled.

This seems like it could be an important clue... if it was possible to spot some pattern in those that are being dropped vs those that are not. So do the errors that you do see coming through have anything in common?

Also, this is maybe not possible in the environment where you're experiencing this but if exceptions are being dropped then that would be logged by the SentrySDK (but only if you had configured SentryOptions.Debug = true): https://github.com/getsentry/sentry-dotnet/blob/bc68b8e60233e9987df8b03759e00720c28a52ae/src/Sentry/Http/HttpTransportBase.cs#L101-L109

The presence or absence of such log messages might indicate a very different cause for the problem (whether the problem lies in the .NET SDK or in the Sentry server).

jamescrosswell commented 7 months ago

Also, this is maybe not possible in the environment where you're experiencing this but if exceptions are being dropped then that would be logged by the SentrySDK (but only if you had configured SentryOptions.Debug = true)

Oh, in fact... we should be able to see that in the Client Reports right? @bruno-garcia / @bitsandfoxes are those reports available in the Sentry UI somewhere? What happens to them after they're sent to Sentry?

akonyer commented 7 months ago

Unfortunately I did not have debug turned on in production. And now our month has ended and we've reset our profiling quota (which is great, things are working perfectly again). However that means that I can no longer provide any new information as I can no longer reproduce the issue.

When they were being dropped this was the messaging I was seeing sometimes:

[17:18:13 INF] Capturing event. [17:18:13 WRN] Failed to end session because there is none active. [17:18:13 INF] Envelope queued up: 'dfd0642e2f504e329faf4362653d4728' [17:18:13 ERR] HttpTransport: Sentry rejected the envelope 'dfd0642e2f504e329faf4362653d4728'. Status code: TooManyRequests. Error detail: .

And this error here makes me think we were hammering the API too frequently, but I don't see how that is possible considering we had used less than 10% of our errors quota for the month, and errors were not even coming through. We also have spike detection turned on, so you would think if we were hitting the API too frequently and getting rate-limited we would be seeing a lot of errors as well? Which we definitely did not see. Looking now we had 11500/300k errors for the month. And that was pretty event distributed across days in the month. No where near enough to be getting rate-limited by the API I don't think.

It seems like as soon as our Performance quota was reached ALL of our API requests were being rate limited.

jamescrosswell commented 7 months ago

[17:18:13 ERR] HttpTransport: Sentry rejected the envelope 'dfd0642e2f504e329faf4362653d4728'. Status code: TooManyRequests. Error detail: .

OK. 429 Too Many Requests is the HTTP response code that Sentry uses for rate limits. Without knowing the contents of the envelope though, it's hard to say whether that was a Trace event or a Crash Report that was dropped.

If you could set options.Debug = true when initializing the Sentry SDK, then the contents of the envelopes would be logged when they get dropped:

https://github.com/getsentry/sentry-dotnet/blob/59f31642c10f8c8159d8294e0a48536b14ad4cd0/src/Sentry/Http/HttpTransportBase.cs#L361-L362

If the logs showed a 429 followed by an envelope containing an exception being dropped, that would indicate an issue with how rate limits are being handed for your account.

akonyer commented 7 months ago

I wont be able to do that until our performance metrics run out at the end of the month. And I'm hoping we've balanced the number of sampling so that we make it to the end of the month. If I notice this again I will turn on debug for you guys.

jamescrosswell commented 7 months ago

@bitsandfoxes do you think it's possibly related to this as well?

Once you've run out of performance units I think the tracing envelopes shouldn't even end up in the queue and so this would no longer play a role. However while you've still got performance units left, if you see a spike in volumes, random envelopes would be discarded (which may include crash reports).

@akonyer if this is relevant, you might be able to alleviate some of your problems by increasing SentryOptions.MaxQueueItems (e.g. set this to 100 or 200).

bitsandfoxes commented 7 months ago

Once you've run out of performance units I think the tracing envelopes shouldn't even end up in the queue and so this would no longer play a role.

As far as I can tell the rate-limiting kicks in only when an event/transaction (as an envelope) already reaches the HTTP transport sitting underneath the CachingTransport. So that behaviour should not change.

But I agree. A high volume of transactions drowning out errors should not happen in any case. We'll have to try reproducing this locally.

jamescrosswell commented 6 months ago

I think I've managed to reproduce this. I created the following simple console program:

var cts = new CancellationTokenSource();
var task = new Task(DoWork, cts.Token);
task.Start();

Console.WriteLine("Hello, World!");
Console.WriteLine("Press ESC to cancel...");
while (!task.IsCompleted)
{
    var keyInput = Console.ReadKey(true);
    if (keyInput.Key == ConsoleKey.Escape)
    {
        cts.Cancel();
    }
}
Console.WriteLine("Goodbye cruel world...");
return;

void DoWork()
{
    while (!cts.Token.IsCancellationRequested)
    {
        var transaction = SentrySdk.StartTransaction("Background", "loop");
        try
        {
            if (Random.Shared.Next(0, 500) != 0)
            {
                Thread.Sleep(100);
                continue;
            }

            throw new Exception("Test exception");
        }
        catch (Exception e)
        {
            SentrySdk.CaptureException(e);
        }
        finally
        {
            transaction.Finish();
        }

        SentrySdk.Flush();
    }
}

Then I let it burn for a bit on my Sentry account. You can see at about 10am I ran out of performance units on my quota:

Image

And then this is what happened to error reports:

Image

You can see I was getting 1-2 per minute while I still had quota and then as soon as I ran out of quota, nada for 17 minutes... and that was when I checked back in and stuck a breakpoint in the code to see what was going on.

The first error that I followed in my debugger was actually sent without any issue, which left me scratching my head (Heisenberg problem).

I then changed the code so that the errors would occur more frequently (just to make it easier to get to the breakpoint I wanted to trigger):

            if (Random.Shared.Next(0, 10) != 0)

After that I could follow some errors that were being dropped by Sentry. This is what I'm getting back from the server in the HttpResponse from here: https://github.com/getsentry/sentry-dotnet/blob/a9e8acd030efb232bb08ec3745c9ef9a479fca8d/src/Sentry/Internal/Http/HttpTransport.cs#L42

StatusCode: 429, ReasonPhrase: 'Too Many Requests', Version: 1.1, Content: System.Net.Http.EmptyContent, Headers:
{
  Content-Length: 0
}, Trailing Headers:
{
}

It's a generic 429 without any headers to explain why.

So the errors are being sent by the .NET SDK. It appears they're being rejected by the Sentry server (even though I still have plenty of error quota left on my account).

@bitsandfoxes / @bruno-garcia I think we need to raise this with another team but I'm not sure who to tap on the shoulder there.

akonyer commented 6 months ago

Amazing! Glad that you managed to reproduce this issue. I thought I was going crazy, but I KNEW that there were errors being dropped that should have made it.

It's interesting you guys think it is an issue on the server. I would have guessed it was more likely a client side issue in the .Net SDK. Otherwise I'd expect this to possibly be an issue across other languages as well.

bitsandfoxes commented 6 months ago

I'm not sure if this is better or worse. But I'll forward this to the appropriate team. Suuper appreciate the sleuthing @jamescrosswell!

Dav1dde commented 6 months ago

It would be super helpful to get the list of all headers returned from the webserver and also the content which was sent (potentially an event envelope mixed with a transaction?).

Re headers: There must have been headers returned, without headers it wouldn't be a valid HTTP response.

jamescrosswell commented 6 months ago

It would be super helpful to get the list of all headers returned from the webserver and also the content which was sent (potentially an event envelope mixed with a transaction?).

There are definitely headers returned however the .NET SDK doesn't (currently) log these or dump them anywhere, for obvious performance reasons.

If rate limiting headers are present, this will result in client reports for any events being dropped: https://github.com/getsentry/sentry-dotnet/blob/0a8a92aa74eb840d9cc6454022ff140c475f38fa/src/Sentry/Http/HttpTransportBase.cs#L97-L104

If we wanted more information, we could potentially log both the headers and the corresponding rate limit that we were applying internally here: https://github.com/getsentry/sentry-dotnet/blob/0a8a92aa74eb840d9cc6454022ff140c475f38fa/src/Sentry/Http/HttpTransportBase.cs#L280

We'd have to be careful about that as that code gets called very frequently. Certainly we'd only do it if Debug was enabled and I think we'd do it in a branch of the code that we didn't intend to merge into main.

@Dav1dde the other issue is that this is currently quite challenging to reproduce for me. To reproduce this currently I need to:

  1. Create a new throw away organization and project
  2. Disable spike protection on the project
  3. Change the config on my test code to use the new DSN
  4. Run the test code above for ~15 minutes until performance units are exhaused
  5. Then the problem manifests.

That's not ideal for Sentry (I end up pounding the Sentry servers and storing a bunch of junk). It's not ideal for me as it takes a very long time to reproduce the problem on each debug attempt.

Can you suggest a better way to do this?

jamescrosswell commented 6 months ago

@Dav1dde my quota was reset so I've generated some more comprehensive logs (including the headers that get returned by Sentry).

The initial rate limiting headers I'm getting back all look good:

  Debug: Envelope handed off to transport (event ID: '75a3027b1552485ea23111f931ca9320'). 1 items in queue.
  Debug: HttpTransport: Received rate limit header: 60:transaction;profile:organization:transaction_usage_exceeded
  Debug: HttpTransport: Rate limit 'transaction' applied. Retry after: 00:01:00.
  Debug: HttpTransport: Received rate limit header: 60:transaction;profile:organization:transaction_usage_exceeded
  Debug: HttpTransport: Rate limit 'profile' applied. Retry after: 00:01:00.
  Error: HttpTransport: Sentry rejected the envelope '75a3027b1552485ea23111f931ca9320'. Status code: TooManyRequests. Error detail: Sentry dropped data due to a quota or internal rate limit being reached. This will not affect your application. See https://docs.sentry.io/product/accounts/quotas/ for more information.. Error causes: .
  Debug: HttpTransport: Failed envelope '75a3027b1552485ea23111f931ca9320' has payload: {"sdk":{"name":"sentry.dotnet","version":"4.4.0"},"event_id":"75a3027b1552485ea23111f931ca9320","trace":{"trace_id":"429aa33b9b5a4b1892b151b3282ca46b","public_key":"d95b9d95c42b4b0a6f3180c612bee116","sampled":"true","sample_rate":"1","release":"Sentry.Samples.Console.Basic@4.4.0\u002B8c78887bc9291b88938ac56d3dd59124668e5c18","environment":"debug","transaction":"Background"},"sent_at":"2024-05-15T01:59:04.258502+00:00"} {"type":"transaction","length":1091} {"type":"transaction","event_id":"75a3027b1552485ea23111f931ca9320","platform":"csharp","release":"Sentry.Samples.Console.Basic@4.4.0\u002B8c78887bc9291b88938ac56d3dd59124668e5c18","transaction":"Background","transaction_info":{"source":"custom"},"start_timestamp":"2024-05-15T01:59:04.10275+00:00","timestamp":"2024-05-15T01:59:04.20383+00:00","request":{},"contexts":{"app":{"type":"app","app_start_time":"2024-05-15T01:58:59.6346108+00:00"},"device":{"type":"device","boot_time":"2024-04-28T23:05:01.6727338+00:00"},"os":{"type":"os","raw_description":"Darwin 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:49 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6020"},"runtime":{"type":"runtime","name":".NET","version":"6.0.22","raw_description":".NET 6.0.22","identifier":"osx.14-arm64"},"trace":{"type":"trace","span_id":"c3ebd54a906b3af1","trace_id":"429aa33b9b5a4b1892b151b3282ca46b","op":"loop","status":"ok"}},"user":{"ip_address":"{{auto}}"},"environment":"debug","sdk":{"packages":[{"name":"nuget:sentry.dotnet","version":"4.4.0"}],"name":"sentry.dotnet","version":"4.4.0"}}

Those are the only two rate limit headers received in the session...

However later on the server rejects envelopes containing Error events (with no transaction)... For example, this one:

  Error: HttpTransport: Sentry rejected the envelope 'c41d2b6841504343a0d9c5580ea860cb'. Status code: TooManyRequests. Error detail: .
  Debug: HttpTransport: Failed envelope 'c41d2b6841504343a0d9c5580ea860cb' has payload: {"sdk":{"name":"sentry.dotnet","version":"4.4.0"},"event_id":"c41d2b6841504343a0d9c5580ea860cb","trace":{"trace_id":"a9da27f33b2d4ea7880d1afa5e6012b9","public_key":"d95b9d95c42b4b0a6f3180c612bee116","release":"Sentry.Samples.Console.Basic@4.4.0\u002B8c78887bc9291b88938ac56d3dd59124668e5c18","environment":"debug"},"sent_at":"2024-05-15T01:59:08.301707+00:00"} {"type":"event","length":4904} {"modules":{"System.Private.CoreLib":"6.0.0.0","Sentry.Samples.Console.Basic":"4.4.0.0","System.Runtime":"6.0.0.0","System.Console":"6.0.0.0","Sentry":"4.4.0.0","System.Net.Primitives":"6.0.0.0","System.IO.Compression":"6.0.0.0","System.Text.Json":"6.0.0.0","System.Collections":"6.0.0.0","System.Text.RegularExpressions":"6.0.0.0","System.Reflection.Emit.ILGeneration":"6.0.0.0","System.Reflection.Emit.Lightweight":"6.0.0.0","System.Threading":"6.0.0.0","System.Memory":"6.0.0.0","System.Reflection.Primitives":"6.0.0.0","System.Linq":"6.0.0.0","Microsoft.Win32.Primitives":"6.0.0.0","System.Private.Uri":"6.0.0.0","System.Diagnostics.Process":"6.0.0.0","System.ComponentModel.Primitives":"6.0.0.0","System.Net.Http":"6.0.0.0","System.Collections.Concurrent":"6.0.0.0","System.Runtime.InteropServices.RuntimeInformation":"6.0.0.0","System.Security.Cryptography.Algorithms":"6.0.0.0","System.Security.Cryptography.Primitives":"6.0.0.0","System.ObjectModel":"6.0.0.0","System.Diagnostics.DiagnosticSource":"6.0.0.0","System.Diagnostics.Tracing":"6.0.0.0","System.Net.Security":"6.0.0.0","System.Threading.Thread":"6.0.0.0","System.Security.Cryptography.X509Certificates":"6.0.0.0","System.Text.Encodings.Web":"6.0.0.0","System.Runtime.Intrinsics":"6.0.0.0","System.Numerics.Vectors":"6.0.0.0","System.Runtime.InteropServices":"6.0.0.0","System.Net.Sockets":"6.0.0.0","System.Threading.ThreadPool":"6.0.0.0","System.Net.NameResolution":"6.0.0.0","System.Collections.NonGeneric":"6.0.0.0","System.Security.Cryptography.Encoding":"6.0.0.0","System.Formats.Asn1":"6.0.0.0","System.Runtime.Numerics":"6.0.0.0","System.Text.Encoding.Extensions":"6.0.0.0","System.Runtime.CompilerServices.Unsafe":"6.0.0.0","System.IO.Compression.Brotli":"6.0.0.0","System.Diagnostics.StackTrace":"6.0.0.0","System.Reflection.Metadata":"6.0.0.0","System.Collections.Immutable":"6.0.0.0","System.Linq.Expressions":"6.0.0.0"},"event_id":"c41d2b6841504343a0d9c5580ea860cb","timestamp":"2024-05-15T01:59:08.257605+00:00","platform":"csharp","release":"Sentry.Samples.Console.Basic@4.4.0\u002B8c78887bc9291b88938ac56d3dd59124668e5c18","exception":{"values":[{"type":"System.Exception","value":"Test exception","module":"System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","thread_id":7,"stacktrace":{"frames":[{"filename":"Program.cs","function":"void Program.\u003CMain\u003E$(string[] args)\u002BDoWork()","lineno":71,"colno":13,"abs_path":"/Users/jamescrosswell/code/sentry-dotnet/samples/Sentry.Samples.Console.Basic/Program.cs","in_app":true,"package":"Sentry.Samples.Console.Basic, Version=4.4.0.0, Culture=neutral, PublicKeyToken=fba2ec45388e2af0","instruction_addr":"0x3e","addr_mode":"rel:0","function_id":"0xc"}]},"mechanism":{"type":"generic","handled":true}}]},"level":"error","request":{},"contexts":{"Current Culture":{"name":"en-NZ","display_name":"English (New Zealand)","calendar":"GregorianCalendar"},"Dynamic Code":{"Compiled":true,"Supported":true},"Memory Info":{"allocated_bytes":5589504,"fragmented_bytes":321288,"heap_size_bytes":1804752,"high_memory_load_threshold_bytes":96894462197,"total_available_memory_bytes":103079215104,"memory_load_bytes":82463372083,"total_committed_bytes":4030464,"promoted_bytes":967448,"pinned_objects_count":4,"pause_time_percentage":0.06,"index":2,"finalization_pending_count":197,"compacted":true,"concurrent":false,"pause_durations":[1.205,0]},"ThreadPool Info":{"min_worker_threads":12,"min_completion_port_threads":12,"max_worker_threads":32767,"max_completion_port_threads":1000,"available_worker_threads":32766,"available_completion_port_threads":1000},"app":{"type":"app","app_start_time":"2024-05-15T01:58:59.6346108+00:00"},"device":{"type":"device","timezone":"NZ","timezone_display_name":"(UTC\u002B12:00) New Zealand Time","boot_time":"2024-04-28T23:05:01.6727338+00:00"},"os":{"type":"os","raw_description":"Darwin 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:49 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6020"},"runtime":{"type":"runtime","name":".NET","version":"6.0.22","raw_description":".NET 6.0.22","identifier":"osx.14-arm64"},"trace":{"type":"trace","span_id":"4461453c860111c1","trace_id":"a9da27f33b2d4ea7880d1afa5e6012b9"}},"user":{"ip_address":"{{auto}}"},"environment":"debug","sdk":{"packages":[{"name":"nuget:sentry.dotnet","version":"4.4.0"}],"name":"sentry.dotnet","version":"4.4.0"},"debug_meta":{"images":[{"type":"pe_dotnet","debug_id":"3b18ffa3-fe80-4df6-9490-4553d25b5371-fe8707cf","debug_checksum":"SHA256:a3ff183b80fef6ed54904553d25b5371cf07877ee8f38c74259686cbe8376a68","debug_file":"/Users/jamescrosswell/code/sentry-dotnet/samples/Sentry.Samples.Console.Basic/obj/Debug/net6.0/Sentry.Samples.Console.Basic.pdb","code_id":"AE90C9518000","code_file":"/Users/jamescrosswell/code/sentry-dotnet/samples/Sentry.Samples.Console.Basic/bin/Debug/net6.0/Sentry.Samples.Console.Basic.dll"}]}}

The response is obviously a 429 but there are no rate limit headers containing any explanation for why the 429. Spike protection has been disabled for this organisation.

Dav1dde commented 6 months ago

Thanks for looking into this again. I would really need to see all headers and the full json response to pin point where this is coming from. Would it be possible to replicate this with a curl script I can run myself?

jamescrosswell commented 6 months ago

@Dav1dde I could give you a .NET program that demonstrated the issue and you could use a packet sniffer like WireShark to record/inspect the traffic to and from Sentry? Would that work?

At first I thought this wouldn't be practical as there's no easy way to have Wireshark decrypt SSL traffic. However I think we could work around that if the .NET program is configured to use the DSN of a Sentry relay (running on HTTP).

I'll see if I can put a github repo together with a docker-compose file for the relay, the .net project itself and instructions on how to compile/run everything from the command line.

Dav1dde commented 6 months ago

@bitsandfoxes can you help get something together which we can use to easily reproduce the issue?

If I understand the issue correctly it should be reproducible down to send transactions until performance quota is exhausted, afterwards send errors and they are dropped? Ideally the reproduction doesn't involve the SDK at all.

jamescrosswell commented 6 months ago

Ideally the reproduction doesn't involve the SDK at all.

We'd need to use some SDK to create the envelopes right? I know the Sentry CLI lets you send events, but the actual json that it sends needs to be created somewhere (with valid/unique SentryIDs etc.). In theory, you could write a bash or powershell script to generate fake envelopes but that sounds like way more work than simply using one of the SDKs to do this...

jamescrosswell commented 6 months ago

@Dav1dde I've created a repository that contains some shell scripts that should help you reproduce this:

There's a readme in there that is hopefully easy enough to follow. Basically one script to create a bunch of envelopes (containing errors). Another script that can be used to send these envelopes, one at a time, using cURL.

I am, however, now having another problem, which is that although I can send these envelopes to Sentry without any problems, via the .NET SDK, when sending via cURL I'm getting an error complaining about an invalid header:

{"detail":"invalid event envelope","causes":["invalid envelope header","expected value at line 1 column 1"]}    

The envelopes are definitely valid (including headers) according to the development docs. I've tried manually editing one of these envelopes to remove everything except the event_id from the header (just to make it as simple as possible) but that doesn't help. I imagine I'm doing something wrong with cURL, which I'm not an expert in, but I figured you might be 🙏

iambriccardo commented 6 months ago

Hi, we spent some time trying to reproduce this issue on our org.

The reproduction was done as follows:

  1. We created a cURL in which we sent enough transactions to exhaust the quota.
  2. We run the script until the quota was exhausted.
  3. We created a cURL in which we sent an error and a transaction 100 times in an interleaved fashion.
  4. We run the script.
  5. We had all of the 100 errors being reported in Sentry's UI.

First Script

for ((i=1; i<=15000; i++))
do
curl -X "POST" "https://o4505952800866304.ingest.sentry.io/api/4507300330799104/envelope/" \
   -H 'Content-Type: application/x-sentry-envelope' \
   -H 'X-Sentry-Auth: Sentry sentry_version=7, sentry_key=f582d2e7e6bfe9698363b2b1a83c92e6' \
   -H 'Cookie: sentry-su=IGGeLBOnfeJN:1s2ZBU:gpSWhsfJByXRwMovOmV457pj4awaqs7nltjz-eTj6aM' \
   -d $'{"dsn":"https://f582d2e7e6bfe9698363b2b1a83c92e6@o4505952800866304.ingest.sentry.io/4507300330799104"}
{"type":"transaction","content_type":"application/json","length":846}
{"event_id":"d2132d31b39445f1938d7e21b6bf0ec4","type":"transaction","transaction":"/organizations/:orgId/performance/:eventSlug/","transaction_info":{"source":"route"},"start_timestamp":1597976392.6542819,"timestamp":1597976400.6189718,"contexts":{"trace":{"trace_id":"4C79F60C11214EB38604F4AE0781BFB2","span_id":"FA90FDEAD5F74052","type":"trace"}},"spans":[{"description":"<OrganizationContext>","op":"react.mount","parent_span_id":"8f5a2b8768cafb4e","span_id":"bd429c44b67a3eb4","start_timestamp":1597976393.4619668,"timestamp":1597976393.4718769,"trace_id":"ff62a8b040f340bda5d830223def1d81"}],"measurements":{"LCP":{"value":420.69},"   lcp_final.element-Size123  ":{"value":1},"fid":{"value":2020},"inp":{"value":100.14},"cls":{"value":null},"fp":{"value":"im a first paint"},"Total Blocking Time":{"value":3.14159},"missing_value":"string"}}
' \
  -i
done

Second Script

for ((i=1; i<=100; i++))
do
curl -X "POST" "https://o4505952800866304.ingest.sentry.io/api/4507300330799104/envelope/" \
   -H 'Content-Type: application/x-sentry-envelope' \
   -H 'X-Sentry-Auth: Sentry sentry_version=7, sentry_key=f582d2e7e6bfe9698363b2b1a83c92e6' \
   -H 'Cookie: sentry-su=IGGeLBOnfeJN:1s2ZBU:gpSWhsfJByXRwMovOmV457pj4awaqs7nltjz-eTj6aM' \
   -d $'{"dsn":"https://f582d2e7e6bfe9698363b2b1a83c92e6@o4505952800866304.ingest.sentry.io/4507300330799104"}
{"type":"event","content_type":"application/json"}
{"message":"my_error_3"}
' \
  -i
curl -X "POST" "https://o4505952800866304.ingest.sentry.io/api/4507300330799104/envelope/" \
   -H 'Content-Type: application/x-sentry-envelope' \
   -H 'X-Sentry-Auth: Sentry sentry_version=7, sentry_key=f582d2e7e6bfe9698363b2b1a83c92e6' \
   -H 'Cookie: sentry-su=IGGeLBOnfeJN:1s2ZBU:gpSWhsfJByXRwMovOmV457pj4awaqs7nltjz-eTj6aM' \
   -d $'{"dsn":"https://f582d2e7e6bfe9698363b2b1a83c92e6@o4505952800866304.ingest.sentry.io/4507300330799104"}
{"type":"transaction","content_type":"application/json","length":846}
{"event_id":"d2132d31b39445f1938d7e21b6bf0ec4","type":"transaction","transaction":"/organizations/:orgId/performance/:eventSlug/","transaction_info":{"source":"route"},"start_timestamp":1597976392.6542819,"timestamp":1597976400.6189718,"contexts":{"trace":{"trace_id":"4C79F60C11214EB38604F4AE0781BFB2","span_id":"FA90FDEAD5F74052","type":"trace"}},"spans":[{"description":"<OrganizationContext>","op":"react.mount","parent_span_id":"8f5a2b8768cafb4e","span_id":"bd429c44b67a3eb4","start_timestamp":1597976393.4619668,"timestamp":1597976393.4718769,"trace_id":"ff62a8b040f340bda5d830223def1d81"}],"measurements":{"LCP":{"value":420.69},"   lcp_final.element-Size123  ":{"value":1},"fid":{"value":2020},"inp":{"value":100.14},"cls":{"value":null},"fp":{"value":"im a first paint"},"Total Blocking Time":{"value":3.14159},"missing_value":"string"}}
' \
  -q >/dev/null
done

This shows that the problem might not be Relay and maybe it's somewhere else. Either how the envelopes are sent or some SDK bug.

Feel free to reproduce it.

jamescrosswell commented 6 months ago

Thanks @iambriccardo - I see there were issues with my bash script (not my use of curl).

In any event, having fixed that, I can no longer reproduce this either via curl or from the .NET SDK (I implemented a LoggingHttpTransport there that does pretty much the same thing as curl in terms of logging headers etc.)

🤷‍♂

bruno-garcia commented 1 month ago

Closing because we can't repro anymore.

Please let us know here if you encounter this problem