High memory usage from ISpan[] and SpanTracer

danzel commented 4 weeks ago

Package

Sentry.AspNetCore

.NET Flavor

.NET Core

.NET Version

8

OS

Any (not platform specific)

SDK Version

4.2.1

Self-Hosted Sentry Version

No response

Steps to Reproduce

We have an aspnetcore app that has ~10k concurrent longlived websocket connections. We're having memory usage issues that appear to be related to sentry. We previously turned down MaxBreadcrumbs to 5 which helped, but we are still having issues.

Output from dotnet-dump analyze dumpheap -stat

7ff9ebaf2fd8   226,829  10,887,792 System.Text.StringBuilder
7ff9eb8f1188   516,022  12,384,528 System.Int32
7ff9ebbd3508   398,540  12,753,280 System.LazyHelper
7ff9ecc1b0d8    25,388  14,001,648 System.Collections.Generic.Dictionary<System.String, Microsoft.Extensions.Primitives.StringValues>+Entry[]
7ff9ec3e2848   487,886  15,612,352 System.Collections.Concurrent.ConcurrentDictionary<System.String, System.Object>
7ff9ecdeb5f8   398,461  15,938,440 System.Lazy<Sentry.MetricsSummaryAggregator>
7ff9ec3ea958    51,562  16,499,840 System.Collections.Concurrent.ConcurrentDictionary<System.String, System.String>+VolatileNode[]
7ff9ec3ea6c8    38,122  20,433,392 System.Collections.Concurrent.ConcurrentQueueSegment<Sentry.Breadcrumb>+Slot[]
7ff9ec3e3690   487,886  27,321,616 System.Collections.Concurrent.ConcurrentDictionary<System.String, System.Object>+Tables
7ff9eb969df8   683,492  32,087,568 System.Int32[]
7ff9ecded048   793,972  38,110,656 System.Collections.Concurrent.ConcurrentDictionary<System.String, System.Object>+Node
7ff9eb8bc4d8   695,222  39,451,568 System.Object[]
7ff9ebaf3650   244,247  43,947,986 System.Char[]
7ff9eb8b5fa8 1,889,870  45,356,880 System.Object
7ff9ecdfa4c8   173,219  45,559,200 Sentry.ISpan[]
7ff9ecdfaa28   385,266  83,217,456 Sentry.SpanTracer
7ff9ec3e3510   487,886 156,123,520 System.Collections.Concurrent.ConcurrentDictionary<System.String, System.Object>+VolatileNode[]
7ff9eb96ec08 2,021,914 312,027,276 System.String
7ff9eba96600 1,039,757 774,289,030 System.Byte[]
0131c7bc0a90 1,021,254 775,785,616 Free
Total 18,283,404 objects, 2,920,864,405 bytes

From my investigation sentry is creating SpanTracer instances when these websocket connections cause http requests to be made (When we receive certain messages on the websocket that causes an http request to be made). Keeping around all of these spans is eating up all the memory.

I'm going to try remove the SentryHttpMessageHandlerBuilderFilter service and see if that helps. Any other suggestions? Thanks!

Expected Result

Memory usage doesn't grow to such a large level.

Actual Result

Memory fills up and the app spends all its time trying to GC.

danzel commented 4 weeks ago

var sentryHttp = services.SingleOrDefault(s => s.ImplementationType?.Name == "SentryHttpMessageHandlerBuilderFilter");
if (sentryHttp != null)
    services.Remove(sentryHttp);

This seems to have resolved the issue, or at least greatly reduced the impact.

jamescrosswell commented 4 weeks ago

Hey @danzel , SentryHttpMessageHandler creates one span per requests here: https://github.com/getsentry/sentry-dotnet/blob/0ee390446894f82b89e92bc7aae4a83122d73f6a/src/Sentry/SentryMessageHandler.cs#L85-L97

As far as I'm aware, the only thing that holds a reference to those spans is the transaction itself: https://github.com/getsentry/sentry-dotnet/blob/a448cc45beef2109a7f5d6d628b4b4dce43f4496/src/Sentry/TransactionTracer.cs#L302

The transaction in an ASP.NET Core application would ordinarily be created by the Sentry Tracing middleware and represents the inbound HttpRequest to your server. So the spans get cleaned up and garbage collected once the Transaction ends.

We have an aspnetcore app that has ~10k concurrent longlived websocket connections.

I wonder if that's our issue. We probably wouldn't want to create a transaction and hold it open for every long lived websocket connection. I'm not sure if or how Sentry was designed to work with WebSockets though.

@danzel what is creating the websockets? Are you using SignalR or something else? I'm wondering if we could try to reproduce this.

danzel commented 3 weeks ago

We use web sockets without signalr. In our actual app we use a middleware to receive them See the first example here

Here is a test app for you @jamescrosswell , it uses a controller to receive the websocket, and reproduces the same: https://github.com/danzel/SentryTest

Usage: Open Server.sln Set a sentry DSN. Run the server (http)

Open Client.sln Run it

Client makes 100 websockets, connects to the server, periodically sends messages over them. Server handles them in WebSocketController, whenever it receives a message on them it does an http request using httpclient (to the local server, anywhere would work though). Memory usage will go up and up. I've added a hosted service that periodically runs GCs to validate that it isn't collectable MemoryMonitorService.cs. Run it for a minute or two, then do a memory dump and you can see what is using up the memory.

Thanks!

jamescrosswell commented 3 weeks ago

Here is a test app for you @jamescrosswell , it uses a controller to receive the websocket, and reproduces the same: https://github.com/danzel/SentryTest

Awesome, thank you so much @danzel ! That should give us enough to go on 👍🏻

jamescrosswell commented 1 week ago

Hey @danzel,

Sorry it took me a while to carve out some time to look into this. I'm not actually seeing any kind of memory leak. This is what I'm getting when running a memory profile in Rider:

So yes, at first, the process appears to be allocating incrementally more and more memory... on my machine it capped out at about 1GB. However after a couple of minutes (once everything is warmed up) it actually released most of that memory and stabalised at ~460MB memory usage... including "unmanaged memory" (which I think is just memory that .NET set aside to be able to write to if/when it needed, to avoid fragmentation). Excluding the unmanaged memory, it's hovering at around 150-300MB of memory usage.

danzel commented 1 week ago

The test app was configured for only 100 web sockets. I've updated it to 10000, it'll take ~15 minutes to connect that many, but the issue should be obvious before then - You'll be out of memory 😅

With the updated config I ran it for about 5 minutes, then memory looked like this (as it says this is just a sample): TotalMemory: 3505005720, Connected: 2334, Messages: 1498237 This is live objects (show dead objects is not ticked).

At 12 minutes (Getting this dump took a very long time, memory usage was struggling) TotalMemory: 12764126104, Connected: 6549, Messages: 6313959

bitsandfoxes commented 1 week ago

So this is not even a question of whether we have a memory leak but about general memory consumption. We did some profiling last fall to ease up on the memory consumption (specifically lots of nested spans within one request). But afaik we have not done any benchmarking or profiling for this specific use case. I guess we're going to have to revisit this.

jamescrosswell commented 1 week ago

So this is not even a question of whether we have a memory leak but about general memory consumption.

I think that's probably true yes. If I change the client to create 10,000 web socket connections like @danzel suggested, even with Sentry disabled memory consumption on the web server continues to grow until the application simply falls over (which doesn't take long on my machine).

The difference in memory consumption between Sentry enabled vs Sentry disabled with only 100 active web sockets seems significant although that might just be relative. Relative to everything else that's going on in this application (which isn't much) Sentry uses a lot of memory. A real world application might open database connections, perform linq queries etc. at which point the overhead that Sentry puts on memory consumption might seem relatively small.

Still, it might be worth creating a benchmark that creates a test server and a bunch of web socket connections to the test server... we could then try to minimise Sentry's memory footprint in that scenario.

danzel commented 6 days ago

What you've said lines up with my thoughts too, not a memory leak, just more memory usage than expected.

getsentry / sentry-dotnet