dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

.NET Blazor App high memory usage under Linux #90163

Closed adamashton closed 8 months ago

adamashton commented 1 year ago

I am seeing extremely high memory usage from my dotnet application when running under Linux. The memory is also never released. When users use my web app they do require large amounts of memory (1-2 GB to run a report) but when they close the tab resources are released and I would expect dotnet to do the same to some extent.

What I'm experiencing is near 90-95% memory usage from my dotnet app. image

Problems

  1. I'm unable to determine if my server is at peak memory usage or dotnet is holding onto memory as there is little memory pressure from other processes.
  2. dotnet memory dumps are huge and I don't think they should be. (6 GB download vs 38 MB in used objects when analysed)

Background Info

Memory Diagnostics from the server

root@a9f25bad2d02:~/site/wwwroot# dotnet-counters monitor

[System.Runtime]
    % Time in GC since last GC (%)                                         0    
    Allocation Rate (B / 10 sec)                                     209,648    
    CPU Usage (%)                                                          0    
    Exception Count (Count / 10 sec)                                       0    
    GC Committed Bytes (MB)                                              105    
    GC Fragmentation (%)                                                  43.752
    GC Heap Size (MB)                                                     68    
    Gen 0 GC Count (Count / 10 sec)                                        0    
    Gen 0 Size (B)                                                19,577,728    
    Gen 1 GC Count (Count / 10 sec)                                        0    
    Gen 1 Size (B)                                                 1,608,856    
    Gen 2 GC Count (Count / 10 sec)                                        0    
    Gen 2 Size (B)                                                49,073,776    
    IL Bytes Jitted (B)                                            4,471,180    
    LOH Size (B)                                                  26,161,384    
    Monitor Lock Contention Count (Count / 10 sec)                         0    
    Number of Active Timers                                               15    
    Number of Assemblies Loaded                                        5,110    
    Number of Methods Jitted                                          64,067    
    POH (Pinned Object Heap) Size (B)                                706,440    
    ThreadPool Completed Work Item Count (Count / 10 sec)                 17    
    ThreadPool Queue Length                                                0    
    ThreadPool Thread Count                                                4    
    Time spent in JIT (ms / 10 sec)                                        0 
    Working Set (MB)                                                   5,773 

root@a9f25bad2d02:~/site/wwwroot# free -m
              total        used        free      shared  buff/cache   available
Mem:           7817        6858         121          19         837         740
Swap:          4095        2156        1939

root@a9f25bad2d02:~/site/wwwroot# top
Tasks:  11 total,   2 running,   8 sleeping,   1 stopped,   0 zombie
%Cpu(s):  1.5 us,  0.7 sy, 50.4 ni, 45.3 id,  2.2 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7817.4 total,     89.3 free,   7262.5 used,    465.6 buff/cache
MiB Swap:   4096.0 total,   1676.2 free,   2419.8 used.    341.3 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                      
 2480 root      30  10  750964 680528    848 R 100.0   8.5  16:31.33 createdump                                                                                                                                   
    1 root      30  10    5492      0      0 S   0.0   0.0   0:00.28 startup.sh                                                                                                                                   
   36 root      30  10   13820      0      0 S   0.0   0.0   0:00.00 sshd                                                                                                                                         
   40 root      30  10    5492      4      0 S   0.0   0.0   0:00.00 bash                                                                                                                                         
   41 root      30  10   12.0g  16720      0 S   0.0   0.2   0:03.38 DiagServer                                                                                                                                   
   47 root      30  10 3602728  47320      0 S   0.0   0.6   1:19.75 dotnet-monitor                                                                                                                               
   65 root      30  10    7268    496    388 S   0.0   0.0   0:00.10 cron                                                                                                                                         
   74 root      30  10 9812340   5.4g  19812 t   0.0  70.7  21:22.97 dotnet                                                                                                                                       
 1908 root      30  10   13948    708    540 S   0.0   0.0   0:00.18 sshd                                                                                                                                         
 1910 root      30  10    5752   2040   1808 S   0.0   0.0   0:00.03 bash                                                                                                                                         
 2526 root      30  10    9780   3448   2972 R   0.0   0.0   0:00.00 top

The dotnet process is using about 100 MB of memory whereas the top command shows it is reserving about 5.5 GB. I understand it reserves memory when there is no memory pressure but reserving 50x the amount needed seems a bit much?

Server and Runtime Information

Azure App Service p1v3 running Linux

root@a9f25bad2d02:~/site/wwwroot# cat /etc/*-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian

root@a9f25bad2d02:~/site/wwwroot# dotnet --list-runtimes
Microsoft.AspNetCore.App 6.0.16 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 6.0.16 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
root@a9f25bad2d02:~/site/wwwroot#

The app is published before being deployed to Azure.

Configuration: Release
Target Framework: net6.0
Deployment Mode: Framework-dependent
Target Runtime: linux-x64

What have I done already

Memory Analysis of my App

The dotnet memory dump was 6 GB in size image

but upon opening it in dotMemory I see no real memory pressure, image

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details
I am seeing extremely high memory usage from my dotnet application when running under Linux. The memory is also never released. When users use my web app they do require large amounts of memory (1-2 GB to run a report) but when they close the tab resources are released and I would expect dotnet to do the same to some extent. What I'm experiencing is near 90-95% memory usage from my dotnet app. ![image](https://github.com/dotnet/runtime/assets/1670343/371feebb-0b82-429b-8920-07c9356e9c3c) ## Problems 1. I'm unable to determine if my server is at peak memory usage or dotnet is holding onto memory as there is little memory pressure from other processes. 2. dotnet memory dumps are huge and I don't think they should be. (6 GB download vs 38 MB objects) ## Background Info **Memory Diagnostics from the server** ``` root@a9f25bad2d02:~/site/wwwroot# dotnet-counters monitor [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 10 sec) 209,648 CPU Usage (%) 0 Exception Count (Count / 10 sec) 0 GC Committed Bytes (MB) 105 GC Fragmentation (%) 43.752 GC Heap Size (MB) 68 Gen 0 GC Count (Count / 10 sec) 0 Gen 0 Size (B) 19,577,728 Gen 1 GC Count (Count / 10 sec) 0 Gen 1 Size (B) 1,608,856 Gen 2 GC Count (Count / 10 sec) 0 Gen 2 Size (B) 49,073,776 IL Bytes Jitted (B) 4,471,180 LOH Size (B) 26,161,384 Monitor Lock Contention Count (Count / 10 sec) 0 Number of Active Timers 15 Number of Assemblies Loaded 5,110 Number of Methods Jitted 64,067 POH (Pinned Object Heap) Size (B) 706,440 ThreadPool Completed Work Item Count (Count / 10 sec) 17 ThreadPool Queue Length 0 ThreadPool Thread Count 4 Time spent in JIT (ms / 10 sec) 0 Working Set (MB) 5,773 root@a9f25bad2d02:~/site/wwwroot# free -m total used free shared buff/cache available Mem: 7817 6858 121 19 837 740 Swap: 4095 2156 1939 ``` The dotnet process is using about 100 MB of memory whereas the free command shows it is reserving about 7 GB. I understand it reserves memory when there is no memory pressure but using 90%+ of the servers memory seems excessive? ### Server and Runtime Information Azure App Service p1v3 running Linux ``` root@a9f25bad2d02:~/site/wwwroot# cat /etc/*-release PRETTY_NAME="Debian GNU/Linux 10 (buster)" NAME="Debian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=debian root@a9f25bad2d02:~/site/wwwroot# dotnet --list-runtimes Microsoft.AspNetCore.App 6.0.16 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 6.0.16 [/usr/share/dotnet/shared/Microsoft.NETCore.App] root@a9f25bad2d02:~/site/wwwroot# ``` The app is published before being deployed to Azure. ``` Configuration: Release Target Framework: net6.0 Deployment Mode: Framework-dependent Target Runtime: linux-x64 ``` ## What have I done already * I've switched between Server and Workstation GC - no real difference. * I've performed a dotnet dump `dotnet-dump collect --type Full` and searched for any memory pressure from within my app, details below. * I've added dotnet counters into my app to report on Gen 0, 1, 2, LOH and Heap size - all values do go up but come down again after the user leaves the site. * I've switched my app to Windows and see that Memory usage is more 'normal' and memory is freed after a while. ### Memory Analysis of my App The dotnet memory dump was 6 GB in size ![image](https://github.com/dotnet/runtime/assets/1670343/9cd3d15a-3b4d-4324-8af1-aa0c2e642572) but upon opening it in dotMemory I see no real memory pressure, ![image](https://github.com/dotnet/runtime/assets/1670343/ec101074-c097-464b-b1d0-de97f31d113a)
Author: adamashton
Assignees: -
Labels: `tenet-performance`, `area-GC-coreclr`
Milestone: -
radical commented 1 year ago

cc @javiercn

hoyosjs commented 1 year ago

The dump being large could in part be https://github.com/dotnet/runtime/issues/71472. I don't expect it to be overly dramatic, but it does increase the size by tickling pages. Res seems to match up what dotnet-counters says - your ram resident set is ~ 6 GB. About the same size of the dump too. 100mb of which is the reserved memory, so there's memory somewhere outside of the GC. What's the memory dotnet-counters reports before the dump is collected?

adamashton commented 1 year ago

I performed these commands after a server restart, after simulation of some users using the app but before any dotnet dumps were collected.

root@c59603927169:~# dotnet-counters monitor --refresh-interval 10 --process-id 72
[System.Runtime]
    % Time in GC since last GC (%)                                         0    
    Allocation Rate (B / 10 sec)                                     206,992    
    CPU Usage (%)                                                          0    
    Exception Count (Count / 10 sec)                                       0    
    GC Committed Bytes (MB)                                              118    
    GC Fragmentation (%)                                                   3.086
    GC Heap Size (MB)                                                     96    
    Gen 0 GC Count (Count / 10 sec)                                        0    
    Gen 0 Size (B)                                                        24    
    Gen 1 GC Count (Count / 10 sec)                                        0    
    Gen 1 Size (B)                                                   660,216    
    Gen 2 GC Count (Count / 10 sec)                                        0    
    Gen 2 Size (B)                                                84,586,528    
    IL Bytes Jitted (B)                                            3,603,218    
    LOH Size (B)                                                   7,242,400    
    Monitor Lock Contention Count (Count / 10 sec)                         0    
    Number of Active Timers                                               14    
    Number of Assemblies Loaded                                        1,205    
    Number of Methods Jitted                                          51,118    
    POH (Pinned Object Heap) Size (B)                                579,976    
    ThreadPool Completed Work Item Count (Count / 10 sec)                 10    
    ThreadPool Queue Length                                                0    
    ThreadPool Thread Count                                                5    
    Time spent in JIT (ms / 10 sec)                                        0    
    Working Set (MB)                                                   1,333    

root@c59603927169:~# top
Tasks:  10 total,   1 running,   9 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.7 us,  2.2 sy,  0.5 ni, 91.6 id,  2.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7817.4 total,   3784.4 free,   2611.4 used,   1421.6 buff/cache
MiB Swap:   4096.0 total,   3662.4 free,    433.6 used.   4995.6 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                      
    1 root      30  10    5.4m   3.1m   2.8m S   0.0   0.0   0:00.29 startup.sh                                                                                                                                   
   36 root      30  10   13.5m   2.9m   2.1m S   0.0   0.0   0:00.00 sshd                                                                                                                                         
   40 root      30  10    5.4m   2.2m   1.9m S   0.0   0.0   0:00.00 bash                                                                                                                                         
   41 root      30  10   12.0g  66.4m  31.1m S   0.0   0.8   0:02.02 DiagServer                                                                                                                                   
   44 root      30  10 3535.1m 120.8m  58.1m S   0.0   1.5   0:04.29 dotnet-monitor                                                                                                                               
   63 root      30  10    7.1m   2.3m   2.0m S   0.0   0.0   0:00.00 cron                                                                                                                                         
   72 root      30  10 4716.9m   1.2g  97.4m S   0.0  16.3   7:09.93 dotnet                                                                                                                                       
  392 root      30  10   13.5m   6.9m   6.0m S   0.0   0.1   0:00.07 sshd                                                                                                                                         
  394 root      30  10    5.6m   3.5m   3.0m S   0.0   0.0   0:00.02 bash                                                                                                                                         
  634 root      30  10    9.6m   3.3m   2.9m R   0.0   0.0   0:00.01 top     

In addition I ran dotnet-counters collect for a duration of time while simulating typical user load.

dotnet-counters collect --output dotnetcollection.csv --refresh-interval 10 --process-id 80

Results available here: dotnetcollection.csv

E.g. GC graphed over the duration. Most memory is released after use by the GC: image

but dotnet holds on to the memory forever (via reserved) even though the app only needs a much smaller amount: image

In addition dotnet will increase memory usage very easily until nearly all available memory is used (as seen in original comment).

TimLovellSmith commented 1 year ago

I've just been reading about performance optimizations in .net 7 and I'm curious if this one might have any bearing on this.

First interesting (I think) thing the linked blogpost mentions is 'segments' (default in .net 6) are large (1GB+) areas of heap memory with server mode, but with workstation, you get smaller by default (256)

“Regions” is a feature of the garbage collector (GC) that’s been in the works for multiple years. It’s enabled by default in 64-bit processes in .NET 7 as of dotnet/runtime#64688, but as with other multi-year features, a multitude of PRs went into making it a reality. At a 30,000 foot level, “regions” replaces the current “segments” approach to managing memory on the GC heap; rather than having a few gigantic segments of memory (e.g. each 1GB), often associated 1:1 with a generation, the GC instead maintains many, many smaller regions (e.g. each 4MB) as their own entity. This enables the GC to be more agile with regards to operations like repurposing regions of memory from one generation to another. For more information on regions, the blog post Put a DPAD on that GC! from the primary developer on the GC is still the best resource.

So what are the key differences between segments and regions? Segments are large units or memory – on Server GC 64-bit if the segment sizes are 1GB, 2GB or 4GB each (for Workstation it's much smaller – 256MB) on SOH. Regions are much smaller units, they are by default 4MB each. So you might ask, "so they are smaller, why is that significant?". To answer that, let's first review how segments work.

Second interesting thing is this bit about how its hard in pratice to reclaim committed memory:

We do decommit on a segment but only the end of the segment which is after the very last live object on that segment (denoted by the light gray space at the end of each segment). And if you have pinning that prevents the GC from retracting the end of the segment, then we can only form free spaces and free spaces are always committed memory.

So I would speculate that what happens in your app, is that your GC has some large segments, it allocates a whole lot of objects in them, and eventually something gets pinned while you have a lot of garbage in the heap.

After a while you reclaim all that garbage, but the GC can't uncommit your memory.

This suggests to me a couple relatively simple experiments you could try, to see if they make a difference:

1) Use workstation GC mode. OK you've tried that one, too bad it didn't help. 2) Update to .Net 7

TimLovellSmith commented 1 year ago

PS, one likely pertinence difference between Linux and Windows, is that Linux doesn't reclaim idle pages the way Windows does. I.e. resident committed memory stays resident.

adamashton commented 8 months ago

I have tried on .NET 8 and can confirm that the problem still exists :(

Memory Dump is 3.6 GB

C:\code\tmp\net8mem> ls

    Directory: C:\code\tmp\net8mem

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          29/02/2024    14:13     3676049408 ln0sdlwk00027F_dotnet_102_core_20240229_135120_495

Which correlates to the memory being used on the Linux App Service Plan image

When analysing using dotnet-dump analyze and runningeeheap I see that

Total bytes consumed by CLR: 0xac3c000 (180600832)

is only 180 Mb.

And using dumpheap we see the summary as

Total 477,318 objects, 46,188,135 bytes

Which is only 46 Mb.

How do I analyze the other 3,420 Mb ?

adamashton commented 8 months ago

The root cause for my issue was not to do with my app or event dotnet at all - it was to do with glib malloc. There is some dynamic sizing of memory allocation blocks going on in malloc and when an app uses lots of memory, fragmentation occurs and the memory is reluctantly released. A better write-up can be found here: https://github.com/dotnet/runtime/issues/13301#issuecomment-535641506

This is not just affecting me but other programs are affected outside of dotnet. E.g.,

There appears to be 2 workarounds,

  1. Either fix the size of glibc.malloc.trim_threshold to stop it being dynamic.
  2. Or reduce the number of arenas via glibc.malloc.arena_max as the default number is 8 * NumOfCores.

More information can be found here: https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html

Ultimately I chose to set the environment variable MALLOC_TRIM_THRESHOLD_=131072 which fixed the problem for me.

In Azure you can conveniently use the Configuration to do this, image

Which I can confirm is set via bash,

kudu_ssh_user@ff3d088f400d:/$ env | grep MALLOC
APPSETTING_MALLOC_TRIM_THRESHOLD_=131072
MALLOC_TRIM_THRESHOLD_=131072

And now I can see memory being released in my Web App 😅, image