dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.29k stars 4.74k forks source link

No-op EventListener + idle Web app = 190 MB/h native memory leak #80684

Closed sandersaares closed 1 year ago

sandersaares commented 1 year ago

Description

Given a web app created using the default .NET 7 project template, plus an EventListener of the following nature:

internal class Listener : EventListener
{
    protected override void OnEventSourceCreated(EventSource eventSource)
    {
        base.OnEventSourceCreated(eventSource);

        EnableEvents(eventSource, EventLevel.Informational, EventKeywords.None, new Dictionary<string, string?>()
        {
            ["EventCounterIntervalSec"] = "1"
        });
    }
}

I am observing steady increase in memory usage when the app is idle (no requests made to the web app after the initial request).

A similar but different pattern can be observed if I use this event listener in a no-op console app - memory usage increases for 10-15 minutes but then plateaus.

Reproduction Steps

  1. Run https://github.com/sandersaares/whatiswrongwithyourdog-eventlistener-native-memory-leak
  2. Wait 10-60 minutes and observe process memory usage.

Expected behavior

Steady resource use under idle load conditions.

Actual behavior

Memory usage rises approximately 190 MB/h on my PC.

image

This memory usage is not visible in Visual Studio's managed memory overview - it appears to be native memory.

Regression?

No response

Known Workarounds

No response

Configuration

.NET SDK: Version: 7.0.102 Commit: 4bbdd14480

Runtime Environment: OS Name: Windows OS Version: 10.0.22621 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\7.0.102\

Host: Version: 7.0.2 Architecture: x64 Commit: d037e070eb

.NET SDKs installed: 6.0.405 [C:\Program Files\dotnet\sdk] 7.0.101 [C:\Program Files\dotnet\sdk] 7.0.102 [C:\Program Files\dotnet\sdk]

.NET runtimes installed: Microsoft.AspNetCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 6.0.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 6.0.13 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 7.0.1 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 7.0.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 6.0.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 6.0.13 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 7.0.1 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 7.0.2 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 6.0.12 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 6.0.13 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 7.0.1 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 7.0.2 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

Other information

Inspecting process memory suggests that the objects in memory may be sets of event payload values, as some repeating strings are exactly what one might find in the payload values:

image

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

sandersaares commented 1 year ago

After waiting even more, it seems to also plateau (or at least slow down) in the web app? Is this some expected buffer/pool filling up? Does it somehow depend on the number of events(?) flying around? (If I try it in a console app, I see a plateau after 10-15 minutes with a much smaller memory use).

image

ghost commented 1 year ago

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Given a web app created using the default .NET 7 project template, plus an EventListener of the following nature: ``` internal class Listener : EventListener { protected override void OnEventSourceCreated(EventSource eventSource) { base.OnEventSourceCreated(eventSource); EnableEvents(eventSource, EventLevel.Informational, EventKeywords.None, new Dictionary() { ["EventCounterIntervalSec"] = "1" }); } } ``` I am observing steady increase in memory usage when the app is idle (no requests made to the web app after the initial request). A similar but different pattern can be observed if I use this event listener in a no-op console app - memory usage increases for 10-15 minutes but then plateaus. ### Reproduction Steps 1. Run https://github.com/sandersaares/whatiswrongwithyourdog-eventlistener-native-memory-leak 2. Wait 10-60 minutes and observe process memory usage. ### Expected behavior Steady resource use under idle load conditions. ### Actual behavior Memory usage rises approximately 190 MB/h on my PC. ![image](https://user-images.githubusercontent.com/9914262/212647460-c401c459-e549-489c-9480-9ae2d61dd1af.png) This memory usage is not visible in Visual Studio's managed memory overview - it appears to be native memory. ### Regression? _No response_ ### Known Workarounds _No response_ ### Configuration .NET SDK: Version: 7.0.102 Commit: 4bbdd14480 Runtime Environment: OS Name: Windows OS Version: 10.0.22621 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\7.0.102\ Host: Version: 7.0.2 Architecture: x64 Commit: d037e070eb .NET SDKs installed: 6.0.405 [C:\Program Files\dotnet\sdk] 7.0.101 [C:\Program Files\dotnet\sdk] 7.0.102 [C:\Program Files\dotnet\sdk] .NET runtimes installed: Microsoft.AspNetCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 6.0.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 6.0.13 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 7.0.1 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 7.0.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 6.0.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 6.0.13 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 7.0.1 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 7.0.2 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 6.0.12 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 6.0.13 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 7.0.1 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 7.0.2 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] ### Other information Inspecting process memory suggests that the objects in memory may be sets of event payload values, as some repeating strings are exactly what one might find in the payload values: ![image](https://user-images.githubusercontent.com/9914262/212649231-1160f08d-98a7-439b-8b87-f78990abe6b8.png)
Author: sandersaares
Assignees: -
Labels: `area-System.Diagnostics.Tracing`, `untriaged`
Milestone: -
tommcdon commented 1 year ago

@noahfalk @davmason

davmason commented 1 year ago

In your sample code you are unconditionally enabling every EventSource in the process, and I originally suspected that is what is causing your memory growth. One of those EventSources would be the NativeRuntimeEventSource, and enabling it would cause use to allocate a bunch of native buffers in EventPipe.

I assume you just want counters due to the EventCounterIntervalSec property, if you only enable the System.Runtime EventSource it should only turn EventCounters on and not trigger and EventPipe native buffers.

internal class Listener : EventListener
{
    protected override void OnEventSourceCreated(EventSource eventSource)
    {
        base.OnEventSourceCreated(eventSource);

        if (eventSource.Name.Equals("System.Runtime"))
        {
            EnableEvents(eventSource, EventLevel.Informational, EventKeywords.None, new Dictionary<string, string?>()
            {
                ["EventCounterIntervalSec"] = "1"
            });
        }
    }
}

However, I just tried that out and still see slow memory growth so more investigation is necessary.

sandersaares commented 1 year ago

I assume you just want counters due to the EventCounterIntervalSec property

Right. I want counters from every event source, though - is there some combination of parameters that would allow me to narrow the scope in this manner?

Or alternatively, is there a set of event sources I should put on a blocklist as they are expensive and will never emit counters?

davmason commented 1 year ago

There's no mechanism to say "enable only if you have an associated counter". You would have to whitelist the specific counters you want when enabling. If you own all the counters you care about then it wouldn't be hard to create a common base class, etc, but there's no built in mechanism.

Generally speaking enabling all EventSources at Verbose will cause a tremendous amount of overhead, there will be memory and CPU costs.

sandersaares commented 1 year ago

This seems not related to a specific event source - any of them appear to reproduce it, albeit at different rates (somehow depending on size of data flying around, presumably).

Creating a custom event source with 1000 counters (and only listening to this one) paints a very clear picture here, 300 MB memory usage increase in 5 minutes.

image

[EventSource(Name = nameof(MyEventSource))]
public sealed class MyEventSource : EventSource
{
    // Just to ensure enough data is generated for any effects are easily visible.
    private const int CounterCount = 1000;

    private EventCounter[] _counters = new EventCounter[CounterCount];

    public MyEventSource()
    {
        for (var i = 0; i < CounterCount; i++)
            _counters[i] = new EventCounter("request-time-" + i, this)
            {
                DisplayName = "Request Processing Time",
                DisplayUnits = "ms"
            };

        Task.Run(async delegate
        {
            while (true)
            {
                await Task.Delay(100);

                for (var i = 0; i < CounterCount; i++)
                    _counters[i].WriteMetric(i);
            }
        });
    }
}

Updated repo in OP with this more straightforward repro scenario.

davmason commented 1 year ago

Yes, I agree there is a memory leak caused by counters. It needs further investigation, I am planning to do that but it may take me a few days.

leculver commented 1 year ago

Hello! I dug into this with @davmason.

I don't believe this is a real memory leak, I think you are just being surprised by normal GC behavior. Turning on these EventCounters simply adds some "work" to the system, and doing that work creates some managed objects which quickly becomes garbage that the GC will eventually clean up.

I took your repro program and let it run for a bit. You are 100% correct that the overall memory usage goes from ~50mb -> ~200mb over the course of an hour or so, then plateaus. I used a small tool I wrote to compare the memory usage early in the application and after a few minutes. The difference seems to be entirely from the GC heap and some PAGE_READWRITE segments which seem to be old GCHeapSegments that haven't been decommissioned yet:

Begin:

> !maddress

Region Type--------------Count-----------Size---------------Size (bytes)
GCHeapSegment       |       80 |      67.40mb |               70,676,480
PAGE_READWRITE      |       21 |       3.26mb |                3,416,064
HandleTable         |        5 |      48.00kb |                   49,152

> !dumpheap -stat
00007ffb59b9ed08     3395       353080 System.Diagnostics.Tracing.CounterPayload

End:

> !maddress
Region Type--------------Count-----------Size---------------Size (bytes)
GCHeapSegment       |       68 |     135.56mb |              142,143,488
PAGE_READWRITE      |       48 |      45.80mb |               48,025,600
HandleTable         |        6 |      56.00kb |                   57,344

> !dumpheap -stat
00007ffb59b9ed08    71000      7384000 System.Diagnostics.Tracing.CounterPayload

On the surface, it looks like CounterPayload (and related objects) are "leaking", but just looking at raw memory (or the raw output of !dumpheap) is misleading. Pulling up the "End" state in PerfView shows that most of this memory is actually garbage on the GC heap. I.E., unrooted memory that the GC will eventually get around to collecting. (Note: you have to set "ExcPats:" to "" in the PerfView UI to show garbage objects.) There's a couple other objects too related to tracing not shown here:

Name                                                        Inc %            Inc     Inc Ct Exc %          Exc   Exc Ct
 System.Private.CoreLib!Diagnostics.Tracing.EventPayload     19.0      1,738,240     40,740  13.6    1,249,360   20,370
+ [not reachable from roots]                                 19.0      1,738,240     40,740   0.0            0        0

So, this seems to be a case where the GC has decided to simply not collect some objects yet, but it would eventually. Of course, sometimes we have bugs in our diagnostic tools, such as ClrMD and PerfView so it would be good to confirm this is really garbage that will be collected eventually.

If you add the following code to the very top of your repro program, you'll notice this "leak" completely goes away, the app will hold at ~50mb of memory usage. (Obviously this is not meant to be production code, just an example to prove this is the reason your memory is growing.)

Thread t = new Thread(GCCollectThread);
t.Start();

void GCCollectThread(object? obj)
{
    while(true)
    {
        Thread.Sleep(2000);
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
        Console.Write(".");
    }
}

One last note, the reason the HandleTable grows a little bit above is because it looks like the event code uses some weak GC handles to do its work. Those will also be cleaned up when the GC cleans up the rest of the garbage laying around. Weak handles don't keep objects alive.

I'm happy to take a closer look if I missed something here, or if you all have any other questions. Thanks!

sandersaares commented 1 year ago

Thanks a lot for the detailed investigation! What you say makes a lot of sense.

I confirm that when I add this force-GC loop, memory usage looks quite normal. With just an EventListener, it is pretty much flat. With the extra 1000 event counters, it grows by an extra 50 MB over an hour but then levels off, which also seems within the bounds of normality.

image

I feel rather stupid for not trying out a forced GC loop in my initial repro. Sometimes life with GC is so simple I forget it exists at all!