getsentry / sentry-dotnet

Sentry SDK for .NET
https://docs.sentry.io/platforms/dotnet
MIT License
566 stars 203 forks source link

Improve OutOfMemory insight #3315

Open bruno-garcia opened 3 weeks ago

bruno-garcia commented 3 weeks ago

Today a OOM exception isn't really actionable. It shows me where it tried to allocated and blew up, which is helpful. But I dunno how much memory it tried to allocate. How much was available, etc.

image (example)

But we have memory information further down in the event detail. It doesn't get displayed very nicely (I created a ticket for that here):

image

Could we show at the top of the screen a summary:

Had X memory, tried to allocate Y? Or just in general show the memory info closer to the top in a better format. Not all values, but the relevant ones?

bruno-garcia commented 2 days ago

Basically the grouping experience image

jamescrosswell commented 1 day ago

An idea from a Discord discussions - it'd be cool if Sentry could be configured to trigger a memory dump (e.g. if the memory utilisation of the application breached some configured threshold - either a fixed amount of memory or a % of total available memory).

We could call dotnet-dump or dotnet-gcdump (that doesn't need to be run with sudo and the dumps can be analyzed in perfview) to generate the dump itself.

For this to work, dotnet-dump/dotnet-gcdump would have to be installed globally or the path where it was located would need to be provided to Sentry (e.g. via SentryOptions).

Even cooler (not sure how feasible this is) would be if these dumps could be sent to Sentry.io and visualised there. Similar functionality could be implemented by other SDKs, if we could align on a common set of dump formats that Sentry supported.

jamescrosswell commented 18 hours ago

Worth noting that one possible cause of OOM exceptions is fragmentation, if you have lots of objects on the LOH (see here). We'll see if we can capture that detail as well.

bruno-garcia commented 14 hours ago

For this to work, dotnet-dump/dotnet-gcdump would have to be installed globally or the path where it was located would need to be provided to Sentry (e.g. via SentryOptions).

We can bundle the executable in the NuGet package and copy to the final app. But for that we need to have the right architecture for the machine. If it's a managed assembly, it'd need the right .NET Runtime version installed on the machine for this to work though.

Worth noting that one possible cause of OOM exceptions is fragmentation, if you have lots of objects on the LOH (see here). We'll see if we can capture that detail as well.

If there's not enough memory it should compact (solving fragmentation issues) and LOH in modern version of .NET should also compact. The video you linked seems like .NET Framework behavior (which is true that fragmentation led to OOM without any way to recover without a reboot, I encounted that in the past).

From https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap :

.NET Core and .NET Framework (starting with .NET Framework 4.5.1) include the GCSettings.LargeObjectHeapCompactionMode property that allows users to specify that the LOH should be compacted during the next full blocking GC. And in the future, .NET may decide to compact the LOH automatically. This means that, if you allocate large objects and want to make sure that they don't move, you should still pin them.

Hm I could be wrong about .NET 8's default behavior, unless it's not controlled by this property: From: https://learn.microsoft.com/en-us/dotnet/api/system.runtime.gclargeobjectheapcompactionmode?view=net-8.0#system-runtime-gclargeobjectheapcompactionmode-default

By default, the LOH is not compacted. A value of CompactOnce indicates that the blocking garbage collection will compact the LOH. After the garbage collection, the value of the GCSettings.LargeObjectHeapCompactionMode property reverts to Default.