dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.23k stars 9.95k forks source link

Asp.Net Core API controller memory leak repopend #45098 #48641

Closed PhilOkeeffe closed 9 months ago

PhilOkeeffe commented 1 year ago

We recently upgraded from .Net 5 - .Net 7 and straight away we could see the memory going up on all our servers and never releasing. Exactley the same as posted originally by @oferze https://github.com/dotnet/aspnetcore/issues/45098 I dont know why this post is closed it's not fixed in a version of VS as suggested (17.4.1.) I have 17.5.5. It's not related to visual studio it's .Net 7 for sure as we never had this problem on .net 5

I agree with @oferze https://github.com/oferze that this is a regression with .NET 7 since .NET 5 & 6 somehow does it better.

Our code hits the controller every 5 seconds but regardless of what our code does or how big the objects are that may or may not end up on the large heap or gen 1 - 2 the memory does not release on the worker process for w3wp.exe for a simple end point.

@oferze demonstrates this well with the code below but you will see the same even if you return an empty string and hit this endpoint continously.

Simply create an endpoint and constantly refresh (f5) once published or in debug mode and see the memory grow and not get released.

    [HttpGet("[action]")]
    public IActionResult GetMem(int testNumber)
    {
        var mem = Process.GetCurrentProcess().PagedMemorySize64;
        return Ok($"Mem: {mem:N0}");
    }
davidfowl commented 1 year ago

@PhilOkeeffe There's no proof of a leak. Maybe you're saying there's more memory usage, but there's no leak (or at least no proof of a leak based in provided information).

Our code hits the controller every 5 seconds but regardless of what our code does or how big the objects are that may or may not end up on the large heap or gen 1 - 2 the memory does not release on the worker process for w3wp.exe for a simple end point.

Did you read through the issue?

cc @mangod9 @Maoni0 for .NET 6 vs 7 (using regions) might be the difference here but I can't tell.

PhilOkeeffe commented 1 year ago

Yes I did read it! Did you read it? If you hit a simple endpoint that does next to nothing and the memory grows but doesn't release this is a regression \ leak with how memory is handled in .NET 7 since .NET 5

davidfowl commented 1 year ago

@PhilOkeeffe if this is to be a productive investigation, then we'd need more information. Hitting an endpoint with the above code does not cause a memory leak.

mangod9 commented 1 year ago

@PhilOkeeffe, can you please clarify whether the memory continues to grow endlessly, or it stabilizes after hitting a steady state (but is just higher than 5)?

PhilOkeeffe commented 1 year ago

@mangod9 Ok so we have observed the memory continually grow up to 12gb in one example and doesn't release unless the App pool recycle period in minutes is hit or iis is restarted. Below is the memory consumption of the iis worker process for one of our instances. image

@davidfowl Ok I will hold fire on code becuase our code worked on .Net 5 and putting our code to one side I would expect IIS to not use lots of memory for something so simple as hello world style endpoint. Try have a simple endpoint hit every 5 seconds for a period 24 hours and see for yourself!!

davidfowl commented 1 year ago

@PhilOkeeffe I don't see the problem when I try to reproduce it, the GC does its job and collects the memory. Maybe there are more details missing, like machine specs? What sort of machine is IIS running on? How many CPU cors and how much memory?

PhilOkeeffe commented 1 year ago

Hi @davidfowl please find below the Server Spec, but also the CSProj settings of the project attatched and also the Spec of my local machine which also shows memory going up but not releasing.

csprofile.txt

Server Spec Windows Server 2016 Virtual Machine CPU 4 virtual processors 2.99 GHZ 16gb RAM IIS version 10.0.14393.0

App pool .NET CLR version : No Managed Code Managed pipeline mode : Classic

C:\Windows\system32>dotnet --list-runtimes Microsoft.AspNetCore.App 5.0.9 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 5.0.10 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 7.0.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 5.0.10 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 7.0.5 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]

Local Dev PC Windows 10 2.8Ghz 4 Cores 16gb Ram IIS version 10.0.19041.1 App pool .NET CLR version : No Managed Code Managed pipeline mode : Classic

Microsoft.AspNetCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 5.0.17 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 6.0.16 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 7.0.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 5.0.10 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 5.0.17 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 6.0.16 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 7.0.5 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 5.0.10 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 5.0.17 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 6.0.16 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 7.0.5 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

davidfowl commented 1 year ago

Does the repro when running on both your dev PC and on the server? Are you running on IIS or IIS Express (inside of visual studio) when your reproduce this problem? Can you reproduce this problem with a new application that does nothing but send requests to that endpoint you showed?

PhilOkeeffe commented 1 year ago

@davidfowl Yes can be reproduced on both dev locally and server. Using IIS express but same for all flavours But most importantly we can even reproduce the behaviour using the oob template for Asp.netcore with react.js hitting the weatherforecast end point 200 times in a minute. image

2 Minute Sample (100 hits per min to weatherforecast endpoint.. Start Memory 62mb goes up on every request getting to 99mb but never released irrespective of GC) image

1Minute sample

See here hitting up the Weatherforecast endpoint roughly 200 times in a minute you can see a couple of random GC happen but they dont make a dent in the memory being used. image

davidfowl commented 1 year ago

Can you do the same with a non-react template? Can you reproduce the issue with a standalone asp.net core web api application? (trying to narrow down the variables)

PhilOkeeffe commented 1 year ago

@davidfowl Yes very same just hitting the home page with a constant refresh . Did you manage to reproduce this on the asp.net core with react template hitting the weatherforecast endpoint? Also please can you tell me what the resolution to this https://github.com/dotnet/aspnetcore/issues/38722 as it's the exact same bug (worker process (w3wp.exe) memory consumption growing quick and not releasing) ? image

image

PhilOkeeffe commented 1 year ago

@davidfowl Please can you tell me what the resolution to this https://github.com/dotnet/aspnetcore/issues/38722 as it's the exact same bug (worker process (w3wp.exe) memory consumption growing quick and not releasing) ? Did you manage to reproduce this on the asp.net core with react template hitting the weatherforecast endpoint?

PhilOkeeffe commented 1 year ago

@davidfowl Please can you tell me what the resolution to this https://github.com/dotnet/aspnetcore/issues/38722 as it's the exact same bug (worker process (w3wp.exe) memory consumption growing quick and not releasing) ? Did you manage to reproduce this on the asp.net core with react template hitting the weatherforecast endpoint?

davidfowl commented 1 year ago

I don't see constant memory growth when I run a load test against an ASP.NET Core API:

image
davidfowl commented 1 year ago

That was .NET 8. This is what I see with .NET 7:

image

I'm using bombardier to generate load. Here's what I did:

Put the app in release mode, opened the memory profiler in visual studio:

image

Then ran a load test for 1 minute and observed the graph:

bombardier-windows-amd64.exe https://localhost:7021/ -d 1m

It spikes > 200MB then flatlines around ~120MB.

Then I collected counters from the process:

image

See https://learn.microsoft.com/en-us/visualstudio/profiling/dotnet-counters-tool?view=vs-2022 or https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-counters

Here are the results of the same load test:

image

and the final results:

image

You can try using those tools on your setup and see if you can replicate the results.

PhilOkeeffe commented 1 year ago

@davidfowl These are there results for our app running on a dev machine calling our endpoint with non production data... I have also set the GC mode to workstation via the runtimeconfig.json { "runtimeOptions": { "configProperties": { "System.GC.Server": false
} } } bombardier-windows-amd64.exe http://localhost:47513/api/measurements -d 1m image

davidfowl commented 1 year ago

Doesn't look like a leak to me. Run that test for longer, say 5 minutes and see what the stats look like. I also want to confirm that the default template does not leak.

PhilOkeeffe commented 1 year ago

@davidfowl Ok I will get back to you!

PhilOkeeffe commented 1 year ago

Hello @davidfowl here is the results from a 5 Minute load test on my local machine for this endpoint http://localhost:47513/api/measurements

When i started the load test iis express was using 113 mb after 1 minute it was on 3175mb after 4 minutes it was using 6670mb and the Heap size increases?. I have a runtimeconfig.json file if you have any suggestions for preventing iis from taking all the memory. My understanding is that by default iis will take all 90% of memory unless to specifiy "System.GC.HeapHardLimitPercent"?

image

image

davidfowl commented 1 year ago

I have also set the GC mode to workstation via the runtimeconfig.json

I missed this. Did you do this to reduce member usage? Is the above trace also workstation GC? I'm not sure you're running workstation GC though, you can confirm at runtime by exposing an endpoint that shows if that GC mode is on using the GCSettings.IsServerGC API.

My understanding is that by default iis will take all 90% of memory unless to specifiy "System.GC.HeapHardLimitPercent"?

No, IIS is no different from the application running outside of IIS with respect to the GC settings. By default ASP.NET Core will use server GC (make sure you read this section).

The quote to remember:

What survives usually determines how much work GC needs to do; what doesn't survive usually determines how often a GC is triggered.

Hopefully that section of the document makes it a little clearer as to when GC's happen. Setting the "System.GC.HeapHardLimitPercent" is a way to limit the visible GC memory, but it might be worth understanding more about the GC behavior with another type of trace (https://github.com/Maoni0/mem-doc/blob/master/doc/.NETMemoryPerformanceAnalysis.md#how-to-start-a-memory-perf-analysis)

PS: I'm not sure what's going on with your Gen2 Size...

PhilOkeeffe commented 1 year ago

@davidfowl Getting back to your last comments Yes I wasn't using workstation GC in the end it was being ignored so I moved the setting into the csproj file where it was working but sadly not making any impact! The only thing that has made a difference was compacting the LOH at the begining of the endpoint call... GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce; GC.Collect() The compacting really halted iis memory growth but unfortuently this is not really as option as our endpoint is triggered every 5 seconds and I could see this was causing a performance hit! Just so you know our endpoint has a size of 16,600 bytes in size so will always going on the LOH but this was not a problem on .Net 5? image

davidfowl commented 1 year ago

What is this endpoint doing?

mangod9 commented 1 year ago

Just so you know our endpoint has a size of 16,600 bytes in size so will always going on the LOH but this was not a problem on .Net 5?

Did you mean 16mb here or 16kb? Adding @cshung since we have been investigating a similar issue where allocations just over 16mb were being made per request which would cause increased working set but not a leak you are observing. We are working on a fix for that particular issue.

to check whether its a similar one, is it possible to test the scenario with DOTNET_GCName=clrgc.dll env. var? This moves the GC implementation back to .NET 6.

PhilOkeeffe commented 1 year ago

@davidfowl @mangod9 @cshung Yes I have tested .Net6 now and it does appear to be a similair issue. When I move the implementation back to .Net 6 it uses much less memory the working set (iis mem) is half the size v's .Net 7! Do you have a ETA for the .net 7 fix?

I set the dotnet 6 dll via the runtimeconfig "System.GC.Name": "clrgc.dll"

Dot Net 6 v's 7 image

Dot net 6 image

image

cshung commented 1 year ago

@PhilOkeeffe, can you please confirm if the allocation 16k or 16M? The 16M case is well understood and the fix is currently work in progress here https://github.com/dotnet/runtime/pull/87715. On the other hand, if you could reproduce a working set issue with 16k, then that's something we don't know and we will need to investigate.

PhilOkeeffe commented 1 year ago

Hi @cshung the issue is reproduceable with 16k calls not 16Mb.

davidfowl commented 1 year ago

@PhilOkeeffe can you show or explain what the endpoint is doing?

PhilOkeeffe commented 1 year ago

@davidfowl It's reading persisted protobuf message files then combining them into a single structure and outputing as Json. see one of our production server's .net 7 v' 6 garbage collector. .Net 7 raises to 1.6 gb in 1 hour (with hourly app pool recyling) v's .net 6 goes to 660mb over the course of 5 hours.
image

mangod9 commented 1 year ago

is it possible to share a standalone repro and/or traces so we could investigate further?

PhilOkeeffe commented 1 year ago

@mangod9 @davidfowl @cshung Can you provide the exact steps for the trace you require? For a repo or zip of code we would have to do some work but it might be possible...

PhilOkeeffe commented 1 year ago

@mangod9 @davidfowl @cshung @pilotola repo https://github.com/pilotola/SampleNet7GC1 Memory went to 450mb in 5 mins using ... bombardier-windows-amd64.exe http://localhost:47513/api.measurements -d 5m image image

davidfowl commented 1 year ago

Is that repository private ?

PhilOkeeffe commented 1 year ago

@davidfowl I've uploaded the zip here..SampleNet7GC-main.zip

cshung commented 1 year ago

Stuck trying to build

Edit: I am able to get around the issue with npm install -g yarn and I am able to reproduce the memory consumption issue.

PhilOkeeffe commented 1 year ago

Hello @cshung @davidfowl just checking how this is going I am concious of time frames as we have several production sites suffering from this issue! Also will this issue remain open if there is no activity? Kind Regards Phil

mangod9 commented 1 year ago

Now that we have a repro, we will continue to investigate and post updates. Assume you can use the GCName=clrgc.dll workaround till then?

oferze commented 1 year ago

I was passively monitoring this thread until now (being a team member of @yarong-lifemap who opened #45098 and having participated in that thread).

I have to say I'm not sure why the previous ticket was so chatty until "closed as completed" and why this one gets so much objection as well (as in "there's no visible bug" or "we're not sure what your API is doing"), where there's a very clear regression between .NET 6 and .NET 7. all other things being equal. The regression alone warrants a bug review, without even auditing other factors.

PhilOkeeffe commented 1 year ago

@mangod9 @cshung @davidfowl @oferze Is there a dll for .net 5 we can use as we have found after 5 days trialling of .net 6 clrgc.dll (with no app pool recycling) the memory cosumption was at 11gb so .net 6 is not a fix it's just not as bad as .net 7 which consumes memory much faster image

davidfowl commented 1 year ago

My take is that your app will not perform well when you build systems that’s allocate 50MB strings per request. I think it’s very likely that your application has a problem that is causing memory growth. There may be regressions with the GC keeping more memory around but if you’re seeing the same issue with older GC then it tells me the application’s allocation patterns are suboptimal.

The reason these issues typically go this way is because it takes a long time to narrow a repro with data that the team can use to investigate. It’s never as simple as it initially seems.

oferze commented 1 year ago

Our case (the other ticket) is a valid use case of returning a big Json to the user as string. Suggesting that the app might have other problems because a 50MB response string "doesn't seem right", is not very productive, because each app's business case is different. We can discuss whether the app's design is good or not (it is) but that is missing the point: the GC works better in .NET 5/6 than in 7. In .NET 7, we proved that allocations stay around for longer than they should, in a minimal API app, uninfluenced by any other factor.

By all means, I'm well aware that investigation takes time, but please don't close a ticket or dismiss it as "your app will not perform well when you build systems that’s allocate 50MB strings per request."

The problem is certainly there.

mangod9 commented 1 year ago

@PhilOkeeffe no there isnt a way to move just the GC back to 5.0.

cshung commented 1 year ago

@oferze, the appropriate forum to report .NET Core GC-related issues is the dotnet/runtime repo. There we also have area labels that will allow us to focus on only GC issues. If you could get your issue opened there and have the appropriate labels set, that will certainly get our attention sooner.

In the case of repeated allocation of large objects, we have recently found and fixed a memory utilization issue related to them. Can you try https://github.com/dotnet/runtime/pull/87715 and see if that improves the memory utilization of your application?

Note that the fix is still in-flight - while the underlying issue is well understood, we might have to alter the design of the fix on the PR to be more adaptive.

The fix has nothing to do with @PhilOkeeffe's case though, in his scenario the number of large objects is insignificant. Experiments also indicate that the fix doesn't help with the memory utilization in his app at all. We are still trying to see what has gone wrong there.

PhilOkeeffe commented 1 year ago

Hello @cshung @mangod9 @davidfowl Just keeping in touch has there been any progress on this? Cheers Phil

cshung commented 1 year ago

We are in the progress of investigating the issue. Here are some preliminary findings.

tl;dr - we have found something, a general improvement to improve heap balancing somehow tripped a perf heuristic to keep the heap size small.

On my machine, I am able to observe the following:

.NET 7: Private memory after all requests finish, around 800M .NET 8: Private memory after all requests finish, around 2.2G [Clearly a regression in memory usage]

.NET 7: 1636.18 requests per second .NET 8: 1632.05 requests per second [Throughput is the same, well within the error bars]

Looking deeper into the reasons why .NET 8 uses more memory, we have: .NET 7: Performed 9 GCs (usually), including a Gen 2 background GC .NET 8: Performed 2 GCs (usually), one gen 0 and one gen 1

Looking further into why .NET 8 performs fewer GC, we have: .NET 7: Gen 0 budget started off with 30 MB and stay more or less the same throughout the 5 minutes of bombarding .NET 8: Gen 0 budget started off with the same 30 MB, but grow to about 120 - 150MB

Keep looking at why the Gen0 budget for .NET 8 grows faster: .NET 7: Survived about 1.2 MB during the first GC and stay there more or less the same. .NET 8: Survived about 2.1 MB during the first GC and keep growing to about 4.5 MB

The 2.1 MB survival caused us to trip a heuristic here. The variable desired_per_heap is the allocation budget.

                        // if min GC size larger than true on die cache, then don't bother
                        // limiting the desired size
                        if ((min_gc_size <= GCToOSInterface::GetCacheSizePerLogicalCpu(TRUE)) &&
                            desired_per_heap <= 2*min_gc_size)
                        {
                            desired_per_heap = min_gc_size;

That's why .NET 7 is able to keep the desired_per_heap low by entering the branch and .NET 8 is not. The extra survived bytes must be the reason.

Keep looking at why .NET 8 survived more:

Since both .NET 7 and .NET 8 run on the same app and have the same initial allocation budget, it is weird to see .NET 8 has more survived bytes. I captured a dump right before the first Gen0 plan phase begin. Now I understand that on .NET 8, the allocations are spread more evenly across all heaps, so in total we allocated more in .NET 8 before we .NET 7 trip the allocation budget line. The survival rate is the same, but since .NET 8 allocated more bytes, the total survived bytes is higher.

Having the allocations spread more evenly across the heaps is generally a good thing because we can balance the work better across the threads when GC happens. It is unfortunate in this case that it tripped the heuristic. I can't really blame the balancing, that should be a good thing, we need to figure out a solution the memory increase though.

davidfowl commented 1 year ago

@cshung did you compare it with .NET 6?

cshung commented 1 year ago

did you compare it with .NET 6?

Nope, I didn't. By default, we will work on the latest version first, which we are most familiar with and most likely to be able to make a fix on.

davidfowl commented 1 year ago

Right, but the issue is comparing .NET 6 and .NET 7, not 8 and 7.

cshung commented 1 year ago

Right, but the issue is comparing .NET 6 and .NET 7, not 8 and 7.

Ooops, missed that. Let's chat offline.

PhilOkeeffe commented 1 year ago

Hello , Is there any updates on a fix.? @cshung @mangod9 @davidfowl Can you raise the priority \ get more people looking at this? Please bear in mind .net 5 is the benchmark as our production app was ported from .Net 5 to 7. Just to recap the regression appears to get worse as you go up major the versions. .Net 6 memory usage also continues to grow with .Net 7 the same but much faster growth and from your comments above the regression is a lot worse in in the jump from .net 7 .net 8 so this fix needs to go back to the memory usage profile of .Net 5.

Thanks Phil

cshung commented 1 year ago

While we don't have a fix yet, I can share some findings on testing around .NET 6. The test is based on running the .NET 6 GC over the .NET 7 runtime. The point of the experiment is to isolate so that we know if there is anything that goes wrong in the GC.

The .NET 6 experiment result is almost identical to .NET 7.

.NET 6: Private memory after all requests finish, around 800M .NET 7: Private memory after all requests finish, around 800M

.NET 6: 1561.64 requests per second .NET 7: 1636.18 requests per second

.NET 6: Performed 9 GCs (usually), including a Gen 2 background GC .NET 7: Performed 9 GCs (usually), including a Gen 2 background GC

.NET 6: Gen 0 budget started off with 30 MB and stay more or less the same throughout the 5 minutes of bombarding .NET 7: Gen 0 budget started off with 30 MB and stay more or less the same throughout the 5 minutes of bombarding

It is obvious that .NET 8 regressed upon .NET 7, but there is no evidence that there is a regression between .NET 6 and .NET 7 in the GC yet.

In order to move forward, it would be nice if we can run your repro app under the .NET 6 runtime. @PhilOkeeffe, can you help us to get your repro app to run on top of it?