Closed cr0fters closed 1 day ago
Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.
Author: | cr0fters |
---|---|
Assignees: | - |
Labels: | `area-GC-coreclr`, `untriaged` |
Milestone: | - |
I am also seeing this issue, in my case it is causing our microservices (running in docker) to run OOM and crash. I was able to reproduce the bug as follows;
Program.cs
to include the following block of code builder.Services.AddHealthChecks();
var app = builder.Build();
app.MapHealthChecks("/hc", new HealthCheckOptions
{
AllowCachingResponses = false,
});
From there, run the program with a memory profiler (I used DotMemory version 2023.3.3). Notice that each time /hc
is called, the health check leaks ~100KB of RAM.
Now, modify the *.csproj
file to change the target framework to dot net 6;
<TargetFramework>net6.0</TargetFramework>
Note:
*.csproj
.WithOpenApi();
extension in Program.cs
Now repeat the profiling exercise. Notice that, although the memory is not 100% constant, it doesn't really exhibit "leaking" behavior.
Please advise as to how to proceed. If you setup a docker container to poll the service at ~10 second intervals, you'll end up with hundreds of MBs of RAM leaked over a few hour timeframe.
Framework information;
Severity: SEVERE; risk of production crash. Framework is effectively not usable.
Just checking whether the scenario requires MapHealthChecks
to be enabled? If so we might have to move this to the asp.net
I'm not using MapHealthChecks
in the app I'm seeing the issue on. Also if that were the case I assume that would appear in managed heap (and visible in DotMemory).
See my screenshots above, according to ECS, I'm using 80% of available memory (2 Gb), however when I perform a dotnet dump
the analysis only shows around 100Mb of usage.
@mangod9 If I don't call MapHealthChecks
there's no health check endpoint for me to query for testing purposes. If I instead use the sample weather forecast endpoint, I do not see the leak.
Is there some way to activate the health check via a REST endpoint without first setting it up via MapHealthChecks
?
thanks for clarifying @cr0fters. So in your case you don't see the managed heap growing much as the memory increases? Are you able to share a dump of before / after so we could investigate further?
I could share a dump from after if that helps? I didn't get a before dump, and we've since deployed it on a different dotnet base image to see if it makes a difference (8.0-alpine
).
The dump file is however 2.2Gb, and also I'd not be comfortable sharing it in public either way. Do you have a more secure way I could share with you?
Did some more tests here, again using the sample API with a GET endpoint of /weatherforecast
/weatherforecast
at ~10 second intervals:
builder.Services.AddHealthChecks();
/hc
endpoint (see sample code above)
/hc
is ~100KB (some health checks leak more than this, some less, some none at all)The problem here is not the ~100KB that is leaked, the problem is the compounding nature of the requests. If the HC is polled by container infrastructure where the container orchestration specifies a max memory allocation, we'll eventually run out of RAM and crash.
so appears there are two separate issues here. @debracey since yours looks related to health checks might make sense to create a new issue in the asp.net repo.
I could share a dump from after if that helps? I didn't get a before dump, and we've since deployed it on a different dotnet base image to see if it makes a difference (
8.0-alpine
).The dump file is however 2.2Gb, and also I'd not be comfortable sharing it in public either way. Do you have a more secure way I could share with you?
Yeah we can provide a share for you to upload the dump. Can you please start an email so we could coordinate over it.. my email should be in the profile. Thx.
@debracey Is your issue similar to the one from @cr0fters in that the leaked memory also isn't on the managed heap? Or is it on the managed heap in your case?
Does anyone else have this issue as well? I assume people do because of all those thumbs up on the issue. If so, could anyone provide any details?
@Neme12 Yes, the leaked memory appears to be in the unmanaged space. A few other engineers and I have been working through this issue and we're now trying to debug the unmanaged memory to gain more details. We haven't gotten very far on that yet.
Although I went ahead and opened the linked bug with the asp .net core project, I think this is actually the same bug. I'm just using a health check to trigger the bug whereas @cr0fters triggered it via a different path.
@mangod9 I think this should be prioritized, it looks like it's not just an inconvenience of applications taking more memory than necessary, but it's causing apps to crash and it's preventing some to upgrade to .NET 8. And multiple people are having the issue, both in this thread and the one in dotnet/aspnetcore#54405
yeah @cr0fters was going to send a dump. But if there are other folks who can share dumps we can investigate. I will let the asp.net team look into the specific healthcheck issue
I noticed similar behaviour recently in diagnostic tools. I've found that there are lots of duplicated strings in Datagrams Received event log after refreshing /hc endpoint for a while. There is possibility that those strings are duplicated only when using diagnostic tools, but i'll leave it to check for you.
I think that the cause of problem is here - we are building the same string over and over. https://github.com/dotnet/runtime/blob/ca48a0d0f733e3477738041b28a624411ee9afd6/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/PollingCounter.cs#L66
Is this issue still a concern after 8.0.4 and perhaps a fix for https://github.com/dotnet/runtime/issues/100502 which would be made in 8.0.5?
Hi @mangod9 - apologies I didn’t share the dump above - I was waiting for the issue to happen again, but around the same time I switched to an Alpine based image and never had the same problem.
This leads me to believe it’s specific to Debian based images as mentioned elsewhere
yeah if switching to Alpine fixes the issue its very likely the same. Ok to close this for now, you can always reopen if it occurs again?
When my team was seeing this before, we were seeing this on photon based images. We don’t use alpine or Debian.
We are still receiving reports from development teams that they’re still seeing un-reclaimed strings, matching the reports/patterns from this ticket.
I am working on narrowing down the possible variables to see if I can pinpoint which SDK versions are working properly. So far my team has seen an improvement with 8.0.204 - but some leaks are still occurring, just at a much slower growth rate.
I am still trying to narrow down what could be causing this. Some of my services are no longer leaking as of 8.0.204, but other services still leak as described in this ticket. There doesn't seem to be a clear pattern. The services are no longer able to downgrade to dot net 6 as they've introduced code changes requiring dot net 8.
@cr0fters do your containers make use of Java for anything, or do you include the JRE (if so which one?) in your container image?
Did further research and updated my findings here
tl;dr Issue is still not resolved as of 8.0.4 with SDK 8.0.300
I am still trying to narrow down what could be causing this. Some of my services are no longer leaking as of 8.0.204, but other services still leak as described in this ticket. There doesn't seem to be a clear pattern. The services are no longer able to downgrade to dot net 6 as they've introduced code changes requiring dot net 8.
@cr0fters do your containers make use of Java for anything, or do you include the JRE (if so which one?) in your container image?
no we don’t make any use of Java
Checking if this is still an issue with latest 8 servicing release. There have been a few memory leak fixes since 8.0.2
Closing since we have fixed a few memory related issues in the latest servicing releases. Please reopen if a leak still exists
Description
I have a .Net MVC web app, that I've recently upgraded to 8.0.2 and just recently it seems to be having memory issues. The app in question runs on 2 AWS ECS Fargate tasks, and in normal usage steadily uses on average 10-15% memory. Whenever the memory issue kicks in, it tends to jump pretty quickly up to ~80% usage and then flat-line before slowly increasing to 100% (then crashing).
This behaviour only really started when we upgraded from .Net 7 to .Net 8.0.0, and had hoped the recent 8.0.2 release a few weeks ago would fix this. It hasn't actually occurred for a few weeks until just today, where it's happened each time I force a new deployment.
Here's a screenshot showing the memory usage for the running task today:
I've ran a dump on one of the running tasks, and downloaded the file locally. The file itself is 2.2Gb in size, (with 80% usage being reported by ECS), however when I analyse the dumpfile (via JetBrains DotMemory and also
dumpheap stat
), they both report just over 100Mb on the heap (which is expected after the app has only been running for around 30 minutes.Here are a few screenshots of the results from these tools:
Reproduction Steps
Unfortunately I've been unable to reproduce this locally. It seems to be very intermittent, in that it hasn't happened in weeks, but when it does it happens back to back a few times.
Expected behavior
Steady memory usage over time
Actual behavior
Memory usage rises suddenly, plateaus at ~80%, before slowly increasing to 100%
Regression?
It previously worked fine in .Net 6 and 7
Known Workarounds
No response
Configuration
No response
Other information
No response