dotnet / diagnostics

This repository contains the source code for various .NET Core runtime diagnostic tools and documents.
MIT License
1.19k stars 356 forks source link

Understanding metrics and memory usage of a .net app (seem mostly consume by cache and fragmentation) #4697

Closed julienGrd closed 5 months ago

julienGrd commented 6 months ago

Hello guys, this subject following this one https://github.com/dotnet/diagnostics/issues/4647, with new element in my posession for understand where the memory is loose

I added some metrics in my app directly in c#, here is the metrics i had at the time of the tests Mémoire managé (GC.GetTotalMemory): 3,36 GB Mémoire TotalCommittedBytes (GC.GetGCMemoryInfo().TotalCommittedBytes): 4,51 GB Mémoire HeapSizeBytes (GC.GetGCMemoryInfo().HeapSizeBytes): 4,36 GB Mémoire WorkingSet64 ( Process.GetCurrentProcess().WorkingSet64): 5,26 GB Mémoire PrivateMemorySize ( Process.GetCurrentProcess().PrivateMemorySize64): 4,91 GB

at this time the task manager was returning around 5 GB for the IIS process, until that its coherent

i ran a dotnet-gcdump and i fall in the same kind of values than the test of the previous ticket live object :13MB Dead Objects : 737MB (im suprised to sill have this amount of dead object because i force some GC.Collect before runing this test)

It seem the GC.GetGCMemoryInfo().TotalCommittedBytes is equivalent at the Managed heap committed VM, can you confirm that ? If yes, and we take the schema of the previous ticket, it will give something like that ?

                        All Commited Virtual Memory (5GB)
                               /                 \
Managed heap committed VM  (4.51GB ???)        All other commited VM (aka 'native memory') (0.5GB ???)
              /                        \
 Memory for objects (750MB)              Caches, fragmentation and bookkeeping  (3.75GB ???)
      /                  \
 Live objects (13MB)    Dead objects (737MB)

if my supposition is right, the memory is loose in the Caches, fragmentation and bookkeeping category.

But i still not understand some metrics. what is suppose to be GC.GetTotalMemory ? it sound logic it will be the Memory for objects category but the values are very different between the c# metric and the gcDump analyse (750Mb vs 3.36Gb). Or maybe i confuse something ?

Now i will have to understand how works this memory and why its so big. Do you have some reommandations on how i can go deeper on my analyse or what can cause this problem ?

Thanks for all your time !

hoyosjs commented 5 months ago

A good starting point to understand how to diagnose memory issues is https://github.com/Maoni0/mem-doc. In general, you have two tools: traces, that tell you with light overhead top level metrics. That way you can get an idea of what the GC sees in your process. It will give you general managed memory fragmentation and give you an idea of where to dig. The other option is to grab a dump and see what tools like VS and windbg think about the memory contained in the dump.

julienGrd commented 5 months ago

Dotnet trace seem not give me more information than what i have. It seem i have to use windbg to understand more deep whats happen. But last Time i wanted to create a full dump with the task manager the pool restarted so i cant analyse the dump properly. And moreover i have to run it in production environnement so its very sensitive. I will Come back here when i will find a way to create dump without destroy the process and analyse it a bit.

noahfalk commented 5 months ago

A good starting point to understand how to diagnose memory issues is https://github.com/Maoni0/mem-doc.

Specifically I'd recommend reading the section on diagnosing large GC heap size

Dotnet trace seem not give me more information than what i have

I am not sure how you are reaching this conclusion? dotnet-trace can collect more information than what you've shown above. In the section of the doc showing how to capture top-level GC metrics it shows a specific dotnet trace command line to use:

dotnet trace collect -p -o <outputpath with .nettrace extension> --profile gc-collect --duration

The data included in the trace will break down the heap size observed at every GC, what kind of GC it was, split memory usage by generation and specialized heaps, and separates out fragmentation as its own item. The diagnosing large GC heap size has guidance for how to look at the data.

julienGrd commented 5 months ago

@noahfalk ok thank you ! i will run again dotnet trace in the server of my client when i will be able to do that

julienGrd commented 5 months ago

Hello guys, i spend some times to run analysis and read the documentation and other explanation about the memory management, i think i collect enough data to investigate properly (a .nettrace during 30 minutes and two full dump, one when the process was at 2Go and one when he was at 4Go).

I believe more and more my problem is not in the managed code but in the native code or in the cache/fragmentation memory. This is part of the problem because this memory in not shown properly in the different analysis tools i used for now.

with dotnet-counter i hade these information when my memory was around 4Go GC Heap Size (MB) 2 469,521 Working Set (MB) 5 066,691

According this video (at 10:42), if these metrics are different it reveal a memory lead but not in the managed code, first of all are you agree with that ? https://www.youtube.com/watch?v=SHGeE_PFA4s&ab_channel=dotnet

do you have some recommandations to analyse properly this native memory ? most of the example or tutorial are focus on memory leak in the .net code.

thanks !

noahfalk commented 5 months ago

first of all are you agree with that ?

As a rough approximation, yes. There are corner cases where the growing allocations come from the .NET runtime even though they aren't objects (ie growing jitted code or growing type system metadata). SOS does have a command called !maddress which can tell you about some other pools of memory that the runtime is aware of because we allocated them. This is stuff like memory for jitted code, type system info, a little more detail on GC. No evidence you've given specifically suggests the leak will be in one of those regions but its easy to check and rule out if you already have dumps.

do you have some recommandations to analyse properly this native memory ?

Its a big topic and varies depending on OS and tools. Its not our specific area of expertise but a few suggestions to get you started:

Hope this helps!

julienGrd commented 5 months ago

@noahfalk for sure it will help ! just another question, i just read this article https://carljohansen.wordpress.com/2020/05/09/compiling-expression-trees-with-roslyn-without-memory-leaks-2/, does this kind of problem can explain my memory comsuption outside the managed heap ? because we use intensively dynamic compilation in my app, it would be the ideal candidate... (im in .net8.0)

noahfalk commented 5 months ago

does this kind of problem can explain my memory comsuption outside the managed heap ?

Its possible, try using the !maddress command or !eeheap command in SOS to learn how much memory is being used by jitted code.

julienGrd commented 5 months ago

Hello guys, I finally was able tu understand where my memory leaks came from.

The guilty was a state in the client side (but in blazor server so the client run over the server) wich was not properly disposed with a timer inside not properly stopped. the consequences was when a client stopped the app, this timer prevent this object being disposed, including all his references, making kind of all the memory reserved for this client not released.

discovering the guilty was such a pain, i use many tools which give me different and sometimes contradictory indication : WinDbg, dotnet-trace, debugDiag, etc

I still don't understand why some metrics indicate me was a ntive memory problem because it was definitively managed code the problem.

I was finally able to analyse the stack properly with this tool : https://memprofiler.com/

I hope one day dotnet will have a single and unified tool to analyse this kind of memory problem

Thanks anyway for your help, i close the subject !