Closed henridd closed 1 year ago
What does it leak? Where's the memory profile? What is it leaking? Did the garbage collector run?
I have edited the ticket with the profile file.
What do you mean by "what does it leak"/"what is it leaking"? I do not have much experience troubleshooting memory leak problems, sorry. If you would have an article to point me to, I would appreciate it.
The GC did run. I tried 2 different tests with a similar code that might give some insight.
using Microsoft.AspNetCore.Diagnostics.HealthChecks;
using Microsoft.Extensions.Diagnostics.HealthChecks;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();
AddAppHealthChecks(builder.Services);
var app = builder.Build();
// Configure the HTTP request pipeline.
if (app.Environment.IsDevelopment())
{
app.UseSwagger();
app.UseSwaggerUI();
}
app.UseHttpsRedirection();
UseAppHealthChecks(app);
app.Run();
void AddAppHealthChecks(IServiceCollection services)
{
_ = services.AddHealthChecks()
.AddCheck<TestHealthCheck>(nameof(TestHealthCheck), HealthStatus.Unhealthy, new List<string>()
{
nameof(TestHealthCheck)
});
}
void UseAppHealthChecks(WebApplication webApplication)
{
webApplication.MapHealthChecks("/health/test", new HealthCheckOptions
{
Predicate = healthCheck => healthCheck.Tags.Contains(nameof(TestHealthCheck)),
});
}
public class TestHealthCheck : IHealthCheck
{
public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default)
{
//GC.Collect();
return HealthCheckResult.Healthy();
}
}
If you uncomment the GC.Collect() call in the CheckHealthAsync method, the memory consumption becomes way more stable, it doesn't increase for every call to the healthcheck endpoint. It seems almost that the memory leak is gone. But if you would comment that line again, and add a Timer which calls GC.Collect() every 30s, for example, the problem persists.
This doesn't sound like a leak then. Please refer to this https://github.com/Maoni0/mem-doc/blob/master/doc/.NETMemoryPerformanceAnalysis.md to get a better idea of what might be happening.
Thanks for the great article. I'll take a look into it and update the ticket when I'm done with it
I read the article and I'm still confused about the behavior. Looking at how GCs are triggered, it seems that GCs should be fired, since we are explicitly calling GC.Collect in this test code. But when looking at the GCStats from PerfView, I get the following output: So even though I'm forcing it, nothing seems to happen. I tried to force a collection every 50 requests, but then PerfView failed to open the GCStats view
Exception Occurred: System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Stats.GCProcess.<>c__DisplayClass0_0.<Collect>b__13(GCMarkWithTypeTraceData data)
at Microsoft.Diagnostics.Tracing.Parsers.Clr.GCMarkWithTypeTraceData.Dispatch()
at Microsoft.Diagnostics.Tracing.TraceEventDispatcher.Dispatch(TraceEvent anEvent)
at Microsoft.Diagnostics.Tracing.Etlx.TraceLogEventSource.Process()
at Stats.GCProcess.Collect(TraceEventDispatcher source, Single sampleIntervalMSec, ProcessLookup`1 perProc, MutableTraceEventStackSource stackSource, Boolean _doServerGCReport, TraceLog traceLog)
at PerfView.PerfViewGCStats.WriteHtmlBody(TraceLog dataFile, TextWriter writer, String fileName, TextWriter log)
at PerfView.PerfViewHtmlReport.<>c__DisplayClass16_0.<Open>b__0()
at PerfView.StatusBar.<>c__DisplayClass19_0.<StartWork>b__0(Object <state>)
An exceptional condition occurred, see log for details.
I've tried making code changes and even creating a similar solution from scratch, but hit the same exception over and over.
Also, sorry for not providing the full generated dump here, since I'm not sure about which kind of data we are allowed to share. But if there is any specific data you would like, let me know and I'll provide it.
Anyway, then I tried to run a small GC monitoring on my own. I've made the modifications in this code. This was the code, executed every 200 requests:
GC.Collect();
var info = GC.GetGCMemoryInfo();
Tracker.TotalPromotions += info.PromotedBytes;
Console.WriteLine($"{Tracker.NumberOfRequests} requests. " + Environment.NewLine +
$"Promoted bytes: {info.PromotedBytes}. Total: {Tracker.TotalPromotions}" + Environment.NewLine +
$"FinalizationPendingCount: {info.FinalizationPendingCount})" + Environment.NewLine +
$"FragmentedBytes: {info.FragmentedBytes}" + Environment.NewLine +
$"TotalCommitedBytes: {info.TotalCommittedBytes}" + Environment.NewLine +
$"HeapSize: {info.HeapSizeBytes}. Growth: {info.HeapSizeBytes - Tracker.PreviousHeapSizeBytes}" + Environment.NewLine +
$"");
Tracker.PreviousHeapSizeBytes = info.HeapSizeBytes;
These were the results:
200 requests.
Promoted bytes: 1789880. Total: 1789880
FinalizationPendingCount: 21
FragmentedBytes: 1554000
TotalCommitedBytes: 7925760
HeapSize: 3343880. Growth: 3343880
400 requests.
Promoted bytes: 1760784. Total: 3550664
FinalizationPendingCount: 17
FragmentedBytes: 1538736
TotalCommitedBytes: 10219520
HeapSize: 3299520. Growth: -44360
600 requests.
Promoted bytes: 1765328. Total: 5315992
FinalizationPendingCount: 4
FragmentedBytes: 1539048
TotalCommitedBytes: 10809344
HeapSize: 3304376. Growth: 4856
800 requests.
Promoted bytes: 1740568. Total: 7056560
FinalizationPendingCount: 4
FragmentedBytes: 1914240
TotalCommitedBytes: 11333632
HeapSize: 3654808. Growth: 350432
1000 requests.
Promoted bytes: 1742088. Total: 8798648
FinalizationPendingCount: 4
FragmentedBytes: 1924272
TotalCommitedBytes: 11988992
HeapSize: 3666360. Growth: 11552
1200 requests.
Promoted bytes: 1740568. Total: 10539216
FinalizationPendingCount: 4
FragmentedBytes: 1539496
TotalCommitedBytes: 12709888
HeapSize: 3280064. Growth: -386296
1400 requests.
Promoted bytes: 1743336. Total: 12282552
FinalizationPendingCount: 4
FragmentedBytes: 1539568
TotalCommitedBytes: 13037568
HeapSize: 3282904. Growth: 2840
1600 requests.
Promoted bytes: 1741288. Total: 14023840
FinalizationPendingCount: 4
FragmentedBytes: 1539544
TotalCommitedBytes: 13168640
HeapSize: 3280832. Growth: -2072
So it seems that bytes are promoted on a steady basis. According to What makes an object survive, they shouldn't be surviving, since I'm not keeping any references, but then those bytes could be something entirely unrelated I suppose. Looking at the HeapSize, it doesn't seem to be increasing. So according to this section, I believe that it confirms your belief that this is not a managed memory leak.
Looking at dotProfile results, it seems the unmanaged memory is definetly though
I've tried to run a Visual Studio profiling session but there was not meaningful in there. Tomorrow I'll get started with WinDbg and see if I can get better results.
It seems that the bigger the response, the bigger the leak. After making the changes below, it took less than 60 requests for the memory consumption to double.
void UseAppHealthChecks(WebApplication webApplication)
{
webApplication.MapHealthChecks("/healthz", new HealthCheckOptions
{
Predicate = healthCheck => healthCheck.Tags.Contains(nameof(TestHealthCheck)),
ResponseWriter = ResponseWriter
});
}
Task ResponseWriter(HttpContext context, HealthReport healthReport)
{
var lengthyString = "";
for (int i = 0; i < 100; i++)
{
lengthyString += Guid.NewGuid().ToString();
}
return context.Response.WriteAsync(lengthyString);
}
I've downloaded Deleaker, and it seems that this MetaDataGetDispenser is mentioned several types
When comparing the first snapshot with the last one, it shows up again
With WinDbg, the output from !heap -stat is very similar, even though the memory consumption is increasing a lot. The only thing different is the External fragmentation. Could it be something? The 1st command was right after startup.
0:034> !heap -stat
NtGlobalFlag enables following debugging aids for new heaps:
stack back traces
LFH Key : 0x1338f8338a5f6faf
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
000001f82f930000 08000002 16756 10028 16364 119 122 5 2 3 LFH
000001f8276c0000 08008000 64 4 64 2 1 1 0 0
000001f831380000 08001002 1472 92 1080 23 2 2 0 0 LFH
000001f8315e0000 08001002 60 8 60 2 1 1 0 0
000001f831720000 08041002 60 8 60 5 1 1 0 0
-------------------------------------------------------------------------------------
0:034> g
(1d00.5dc0): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00007ffe`463d0b70 cc int 3
0:034> !heap -stat
NtGlobalFlag enables following debugging aids for new heaps:
stack back traces
LFH Key : 0x1338f8338a5f6faf
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
000001f82f930000 08000002 16756 10028 16364 2076 129 5 2 3 LFH
External fragmentation 20 % (129 free blocks)
000001f8276c0000 08008000 64 4 64 2 1 1 0 0
000001f831380000 08001002 1472 92 1080 23 2 2 0 0 LFH
000001f8315e0000 08001002 60 8 60 2 1 1 0 0
000001f831720000 08041002 60 8 60 5 1 1 0 0
-------------------------------------------------------------------------------------
0:034> g
(1d00.3a1c): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00007ffe`463d0b70 cc int 3
0:023> !heap -stat
NtGlobalFlag enables following debugging aids for new heaps:
stack back traces
LFH Key : 0x1338f8338a5f6faf
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
000001f82f930000 08000002 16756 10028 16364 2303 145 5 2 3 LFH
External fragmentation 23 % (145 free blocks)
000001f8276c0000 08008000 64 4 64 2 1 1 0 0
000001f831380000 08001002 1472 92 1080 23 2 2 0 0 LFH
000001f8315e0000 08001002 60 8 60 2 1 1 0 0
000001f831720000 08041002 60 8 60 5 1 1 0 0
-------------------------------------------------------------------------------------
I'm not sure where I should go from here. Is there a cleanup method that I'm missing?
I'm not seeing evidence of a managed or unmanaged memory leak here. If there was I would expect that as you executed more requests that the amount of memory consumed would balloon and eventually the process would crash (assuming you kept firing load into it).
Using your repro I used Postman to fire 3000+ requests to the health-check endpoint and memory utilization didn't go above 60MB.
We'll probably need to see stronger evidence of an issue here to keep investigating.
I'm not seeing evidence of a managed or unmanaged memory leak here. If there was I would expect that as you executed more requests that the amount of memory consumed would balloon and eventually the process would crash (assuming you kept firing load into it).
Using your repro I used Postman to fire 3000+ requests to the health-check endpoint and memory utilization didn't go above 60MB.
We'll probably need to see stronger evidence of an issue here to keep investigating.
Hey @mitchdenny thanks for looking into this. That's precisely what happens here. Did you try my latest example, where the health check writes a bunch of GUIDs in the Response? That caused the memory consumption to raise super fast. I've set up a git repository with the code: https://github.com/henridd/health-check-mem-leaking.
I thought that it could be somehow something faulty on my machine, but we began this investigation after finding a leak on the server.
I've attached a video which shows the memory consumption increasing. It does seem to somehow stabilize once it reaches 200mb though, but I suppose it shouldn't increase that much, right?
That isn't surprising in your example because you are allocating a bunch of strings every time you do a health check so there will be more memory being consumed. Each iteration of your string generation logic in is going to generate > 300kb of memory usage (that is a very wasteful way to generate a long string by the way - but that may have been intentional).
Do a hundred requests and you've allocated > 30MB of just strings and that is before we get to any specific .NET or ASP.NET overheads.
You might like to read a bit more background on the different GC modes we have in .NET and how they impact when memory is collected:
https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/workstation-server-gc
I see. I assumed that the memory consumption shouldn't exceed the initial amount of consumed memory, as nothing is being stored, but I probably should investigate this further in order to understand it better. To be fair, from the beginning it sounded strange to me that there would be a leak in something so used, in a normal scenario.
I will keep investigating and researching about memory and try to reduce the memory consumption.
Thanks @davidfowl and @mitchdenny for your inputs
Is there an existing issue for this?
Describe the bug
Every single request to health checks leaks a bit of memory. Eventually, if the application is ran 24/7 with multiple applications constantly requesting health checks, the memory consumption can get very high.
Expected Behavior
The memory should be freed after the resources have been used.
Steps To Reproduce
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddHealthChecks() .AddCheck("Sample");
var app = builder.Build();
app.MapHealthChecks("/healthz");
app.Run();