dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.66k stars 3.15k forks source link

Possible finalizer thread stuck on System.RuntimeMethodHandle.Destroy #31782

Closed vtortola closed 11 months ago

vtortola commented 1 year ago

Description

When a single application instance reaches ~1700 Rps and ~85% Cpu usage during a short period of time due a traffic spike, around a 25% of our servers experiment what it seems to be a blockage in the finalizer thread and the application starts hoarding memory and increasing latency. Eventually we have to kill the server when memory is at 10X of what it uses normally and latency is not acceptable.

Exploring with dotnet-dump, we see that after the requests spike the finalizer queue starts to accumulate objects. Taking multiple dumps shows that the "Ready for finalization" keep growing in each heap.

When exploring the threads, there is always a thread 0x001B that is stuck in this frame:

> clrstack -a                                                                                                                                                                                                                     
OS Thread Id: 0x1b (21)
        Child SP               IP Call Site
00007F0002EC9930 00007f00081947b2 [InlinedCallFrame: 00007f0002ec9930] 
00007F0002EC9930 00007eff88696957 [InlinedCallFrame: 00007f0002ec9930] 
00007F0002EC9920 00007EFF88696957 System.RuntimeMethodHandle.Destroy(System.RuntimeMethodHandleInternal)
    PARAMETERS:
        method = <no data>

00007F0002EC99C0 00007EFF9623E087 System.Reflection.Emit.DynamicResolver+DestroyScout.Finalize() [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicILGenerator.cs @ 669]
    PARAMETERS:
        this = <no data>

00007F0002EC9D30 00007f0007957cc6 [DebuggerU2MCatchHandlerFrame: 00007f0002ec9d30] 

If I take a dump 30 minutes later, it still shows this frame with this same data in this thread.

Those servers keep having a higher % of time in GC image

Configuration

Server is an Azure Standard_F16s_v2 (16 cores, 32 GiB RAM).

Docker image mcr.microsoft.com/dotnet/aspnet:7.0.11.

# dotnet --info                                                                                                                                                                               

Host:
  Version:      7.0.11
  Architecture: x64
  Commit:       ecb34f85ec

.NET SDKs installed:
  No SDKs were found.

.NET runtimes installed:
  Microsoft.AspNetCore.App 7.0.11 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 7.0.11 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
# uname -a
Linux live-api-75fcf998d-hcwvz 5.15.0-1041-azure dotnet/runtime#48-Ubuntu SMP Tue Jun 20 20:34:08 UTC 2023 x86_64 GNU/Linux

Regression?

We are not completely sure, but we have the feeling it started happening when we moved to .NET7 from .NET6. When we were in .NET6 we did not do see this kind of situation. After we moved to .NET7 we started seeing some machines using an unexpected big amount of memory that starts on a traffic spike situation.

Data

> syncblk                                                                                                                                                                                                                         
Index         SyncBlock MonitorHeld Recursion Owning Thread Info          SyncBlock Owner
-----------------------------
Total           2661
Free            793
> finalizequeue -stat                                                                                                                                                                                                             
SyncBlocks to be cleaned up: 0
Free-Threaded Interfaces to be released: 0
MTA Interfaces to be released: 0
STA Interfaces to be released: 0
----------------------------------

Heap 0
generation 0 has 29 objects (7ef79c121920->7ef79c121a08)
generation 1 has 2,464 objects (7ef79c11cc20->7ef79c121920)
generation 2 has 0 objects (7ef79c11cc20->7ef79c11cc20)
Ready for finalization 560 objects (7ef79c123438->7ef79c1245b8)
------------------------------
Heap 1
generation 0 has 5 objects (7ef5f820ada8->7ef5f820add0)
generation 1 has 2,013 objects (7ef5f8206ec0->7ef5f820ada8)
generation 2 has 0 objects (7ef5f8206ec0->7ef5f8206ec0)
Ready for finalization 309 objects (7ef5f820b480->7ef5f820be28)
------------------------------
Heap 2
generation 0 has 7 objects (7ef5bc101488->7ef5bc1014c0)
generation 1 has 1,409 objects (7ef5bc0fe880->7ef5bc101488)
generation 2 has 0 objects (7ef5bc0fe880->7ef5bc0fe880)
Ready for finalization 321 objects (7ef5bc101ea0->7ef5bc1028a8)
------------------------------
Heap 3
generation 0 has 14 objects (7ef6142535b8->7ef614253628)
generation 1 has 1,295 objects (7ef614250d40->7ef6142535b8)
generation 2 has 0 objects (7ef614250d40->7ef614250d40)
Ready for finalization 321 objects (7ef6142540a0->7ef614254aa8)
------------------------------
Heap 4
generation 0 has 2 objects (7ef7a0bbfa88->7ef7a0bbfa98)
generation 1 has 729 objects (7ef7a0bbe3c0->7ef7a0bbfa88)
generation 2 has 0 objects (7ef7a0bbe3c0->7ef7a0bbe3c0)
Ready for finalization 322 objects (7ef7a0bc05c0->7ef7a0bc0fd0)
------------------------------
Heap 5
generation 0 has 10 objects (7eff84129200->7eff84129250)
generation 1 has 992 objects (7eff84127300->7eff84129200)
generation 2 has 0 objects (7eff84127300->7eff84127300)
Ready for finalization 388 objects (7eff84129bd0->7eff8412a7f0)
------------------------------
Heap 6
generation 0 has 5 objects (7ef694166578->7ef6941665a0)
generation 1 has 1,511 objects (7ef694163640->7ef694166578)
generation 2 has 0 objects (7ef694163640->7ef694163640)
Ready for finalization 339 objects (7ef694166fa8->7ef694167a40)
------------------------------
Heap 7
generation 0 has 2 objects (7ef65c27acc0->7ef65c27acd0)
generation 1 has 1,692 objects (7ef65c2777e0->7ef65c27acc0)
generation 2 has 0 objects (7ef65c2777e0->7ef65c2777e0)
Ready for finalization 316 objects (7ef65c27b690->7ef65c27c070)
------------------------------
Heap 8
generation 0 has 5 objects (7ef79c180c20->7ef79c180c48)
generation 1 has 594 objects (7ef79c17f990->7ef79c180c20)
generation 2 has 0 objects (7ef79c17f990->7ef79c17f990)
Ready for finalization 376 objects (7ef79c1818f0->7ef79c1824b0)
------------------------------
Heap 9
generation 0 has 7 objects (7ef770377290->7ef7703772c8)
generation 1 has 832 objects (7ef770375890->7ef770377290)
generation 2 has 0 objects (7ef770375890->7ef770375890)
Ready for finalization 388 objects (7ef770377bd0->7ef7703787f0)
------------------------------
Heap 10
generation 0 has 10 objects (7ef79c245ce0->7ef79c245d30)
generation 1 has 804 objects (7ef79c2443c0->7ef79c245ce0)
generation 2 has 0 objects (7ef79c2443c0->7ef79c2443c0)
Ready for finalization 390 objects (7ef79c246408->7ef79c247038)
------------------------------
Heap 11
generation 0 has 8 objects (7ef6881c80c8->7ef6881c8108)
generation 1 has 1,297 objects (7ef6881c5840->7ef6881c80c8)
generation 2 has 0 objects (7ef6881c5840->7ef6881c5840)
Ready for finalization 327 objects (7ef6881c8b78->7ef6881c95b0)
------------------------------
Heap 12
generation 0 has 10 objects (7ef79c2af180->7ef79c2af1d0)
generation 1 has 2,222 objects (7ef79c2aac10->7ef79c2af180)
generation 2 has 0 objects (7ef79c2aac10->7ef79c2aac10)
Ready for finalization 485 objects (7ef79c2afc38->7ef79c2b0b60)
------------------------------
Heap 13
generation 0 has 13 objects (7ef74815af60->7ef74815afc8)
generation 1 has 350 objects (7ef74815a470->7ef74815af60)
generation 2 has 0 objects (7ef74815a470->7ef74815a470)
Ready for finalization 329 objects (7ef74815b948->7ef74815c390)
------------------------------
Heap 14
generation 0 has 8 objects (7ef79c79e070->7ef79c79e0b0)
generation 1 has 322 objects (7ef79c79d660->7ef79c79e070)
generation 2 has 0 objects (7ef79c79d660->7ef79c79d660)
Ready for finalization 25,489 objects (7ef79c79e9b8->7ef79c7d0640)
------------------------------
Heap 15
generation 0 has 7 objects (7ef6802dadd0->7ef6802dae08)
generation 1 has 508 objects (7ef6802d9df0->7ef6802dadd0)
generation 2 has 0 objects (7ef6802d9df0->7ef6802d9df0)
Ready for finalization 404,766 objects (7ef6802db588->7ef6805f1e78)
------------------------------
Statistics for all finalizable objects (including all objects ready for finalization):
Statistics:
          MT   Count  TotalSize Class Name

[...]

7eff8de90098     538     17,216 Microsoft.Win32.SafeHandles.SafeEvpMdCtxHandle
7eff917fa6f8      70     18,480 Microsoft.Data.SqlClient.SNI.SNISslStream
7eff90117f28     355     19,880 System.Runtime.CompilerServices.ConditionalWeakTable<System.Text.Json.JsonSerializerOptions, System.Object>+Container
7eff9433c1f0     361     20,216 System.IO.Compression.Deflater
7eff8c2c7b88     316     20,224 System.Net.Sockets.SafeSocketHandle
7eff8d8ce2c8     663     21,216 Microsoft.Win32.SafeHandles.SafeX509Handle
7eff8d830920   1,255     30,120 System.Threading.TimerHolder
7eff9433efe8     394     31,520 System.IO.Compression.ZLibNative+ZLibStreamHandle
7eff8c2c5228     316     37,920 System.Net.Sockets.Socket
7eff927f5370     180     44,640 Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver
7eff8d859958     223     62,440 System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs
7eff9146acb8     389     80,912 Microsoft.Data.SqlClient.SqlConnection
7eff9149f348     980    290,080 Microsoft.Data.SqlClient.SqlCommand
7eff89253d70   4,192    301,824 System.Threading.Thread
7eff90b81468 247,708  5,944,992 System.Reflection.Emit.DynamicResolver+DestroyScout
7eff8ad24450 199,413 14,357,736 System..Reflection.Emit.DynamicResolver
Total 460,012 objects, 21,441,768 bytes
> dumpheap -stat

[...]

7eff8acfc688   207,010   9,936,480 System.Reflection.Emit.DynamicMethod+RTDynamicMethod
7eff8ad200a8   207,010   9,936,480 System.Reflection.Emit.ScopeTree
7eff8acfcf68   208,325   9,999,600 System.Reflection.Emit.SignatureHelper
7eff8919ec38   605,088  14,522,112 System.RuntimeTypeHandle
7eff8ad24450   207,001  14,904,072 System.Reflection.Emit.DynamicResolver
7eff89817a20   212,441  20,394,336 System.RuntimeMethodInfoStub
7eff8acfbda0   207,010  23,185,120 System.Reflection.Emit.DynamicMethod
7eff8912a508   358,123  25,959,536 System.Object[]
7eff8acfb6e8   207,010  31,465,520 System.Reflection.Emit.DynamicILGenerator
7eff89251038 1,241,746 120,095,736 System.String
7eff89773338 1,070,259 136,469,834 System.Byte[]
7eff892b7aa8    90,512 260,066,802 System.Char[]
55fb5eb4e900    55,769 364,019,016 Free
Total 10,141,843 objects, 1,321,079,065 bytes
[System.Runtime]
    % Time in GC since last GC (%)                                         5    
    Allocation Rate (B / 1 sec)                                       6.2155e+08   
    CPU Usage (%)                                                         57.81 
    Exception Count (Count / 1 sec)                                       94    
    GC Committed Bytes (MB)                                            1,675.366
    GC Fragmentation (%)                                                  44.356
    GC Heap Size (MB)                                                    637.815
    Gen 0 GC Count (Count / 1 sec)                                         2    
    Gen 0 Size (B)                                                 4,236,872    
    Gen 1 GC Count (Count / 1 sec)                                         1    
    Gen 1 Size (B)                                                11,126,616    
    Gen 2 GC Count (Count / 1 sec)                                         0    
    Gen 2 Size (B)                                                    5.2245e+08   
    IL Bytes Jitted (B)                                               3.1923e+08   
    LOH Size (B)                                                      5.7157e+08   
    Monitor Lock Contention Count (Count / 1 sec)                      1,212    
    Number of Active Timers                                               46    
    Number of Assemblies Loaded                                          252    
    Number of Methods Jitted                                       9,647,443    
    POH (Pinned Object Heap) Size (B)                              4,465,760    
    ThreadPool Completed Work Item Count (Count / 1 sec)              23,982    
jkotas commented 1 year ago

If I take a dump 30 minutes later, it still shows this frame with this same data in this thread.

This fragment does not show the handle of the method being finalized. It is not a proof that the finalizer is stuck finalizing the same dynamic method. What is the full stacktrace from k command? Can you find the handle of the method being finalized from it?

The most likely explanation of the symptoms is that each request produces one or more DynamicMethods and the finalizer thread is not able to clean them up fast enough. Try to find out what is producing these dynamic methods. The dynamic methods are typically meant to be cached, and it is not unusual to a have a bug where it is not happening.

The problem can be magnified by enable tracing. Tracing makes the cleanup of DynamicMethods more expensive and so it makes it more likely that the finalizer thread won't be able to keep up with high DynamicMethod churn.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details
### Description When our application reaches ~1700 Rps and ~85% Cpu usage per server during a short period of time due a traffic spike, around a 25% of our servers experiment what it seems to be a blockage in the finalizer thread and the application starts hoarding memory and increasing latency. Eventually we have to kill the server when memory is at 10X of what it uses normally and latency is not acceptable. Exploring with dotnet-dump, we see that after the requests spike the finalizer queue starts to accumulate objects. Taking multiple dumps shows that the "Ready for finalization" keep growing in each heap. When exploring the threads, there is always a thread `0x001B` that is stuck in this frame: ``` > clrstack -a OS Thread Id: 0x1b (21) Child SP IP Call Site 00007F0002EC9930 00007f00081947b2 [InlinedCallFrame: 00007f0002ec9930] 00007F0002EC9930 00007eff88696957 [InlinedCallFrame: 00007f0002ec9930] 00007F0002EC9920 00007EFF88696957 System.RuntimeMethodHandle.Destroy(System.RuntimeMethodHandleInternal) PARAMETERS: method = 00007F0002EC99C0 00007EFF9623E087 System.Reflection.Emit.DynamicResolver+DestroyScout.Finalize() [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicILGenerator.cs @ 669] PARAMETERS: this = 00007F0002EC9D30 00007f0007957cc6 [DebuggerU2MCatchHandlerFrame: 00007f0002ec9d30] ``` If I take a dump 30 minutes later, it still shows this frame with this same data in this thread. Those servers keep having a higher % of time in GC ![image](https://github.com/dotnet/runtime/assets/704420/16f7269a-e55a-4437-91cd-7e0d07c5be42) ### Configuration Server is an Azure [Standard_F16s_v2](https://learn.microsoft.com/en-us/azure/virtual-machines/fsv2-series) (16 cores, 32 GiB RAM). Docker image `mcr.microsoft.com/dotnet/aspnet:7.0.11`. ``` # dotnet --info Host: Version: 7.0.11 Architecture: x64 Commit: ecb34f85ec .NET SDKs installed: No SDKs were found. .NET runtimes installed: Microsoft.AspNetCore.App 7.0.11 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 7.0.11 [/usr/share/dotnet/shared/Microsoft.NETCore.App] ``` ``` # uname -a Linux live-api-75fcf998d-hcwvz 5.15.0-1041-azure dotnet/runtime#48-Ubuntu SMP Tue Jun 20 20:34:08 UTC 2023 x86_64 GNU/Linux ``` ### Regression? We are not completely sure, but we have the feeling it started happening when we moved to .NET7 from .NET6. When we were in .NET6 we did not do see this kind of situation. After we moved to .NET7 we started seeing some machines using an unexpected big amount of memory that starts on a traffic spike situation. ### Data ``` > syncblk Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner ----------------------------- Total 2661 Free 793 ``` ``` > finalizequeue -stat SyncBlocks to be cleaned up: 0 Free-Threaded Interfaces to be released: 0 MTA Interfaces to be released: 0 STA Interfaces to be released: 0 ---------------------------------- Heap 0 generation 0 has 29 objects (7ef79c121920->7ef79c121a08) generation 1 has 2,464 objects (7ef79c11cc20->7ef79c121920) generation 2 has 0 objects (7ef79c11cc20->7ef79c11cc20) Ready for finalization 560 objects (7ef79c123438->7ef79c1245b8) ------------------------------ Heap 1 generation 0 has 5 objects (7ef5f820ada8->7ef5f820add0) generation 1 has 2,013 objects (7ef5f8206ec0->7ef5f820ada8) generation 2 has 0 objects (7ef5f8206ec0->7ef5f8206ec0) Ready for finalization 309 objects (7ef5f820b480->7ef5f820be28) ------------------------------ Heap 2 generation 0 has 7 objects (7ef5bc101488->7ef5bc1014c0) generation 1 has 1,409 objects (7ef5bc0fe880->7ef5bc101488) generation 2 has 0 objects (7ef5bc0fe880->7ef5bc0fe880) Ready for finalization 321 objects (7ef5bc101ea0->7ef5bc1028a8) ------------------------------ Heap 3 generation 0 has 14 objects (7ef6142535b8->7ef614253628) generation 1 has 1,295 objects (7ef614250d40->7ef6142535b8) generation 2 has 0 objects (7ef614250d40->7ef614250d40) Ready for finalization 321 objects (7ef6142540a0->7ef614254aa8) ------------------------------ Heap 4 generation 0 has 2 objects (7ef7a0bbfa88->7ef7a0bbfa98) generation 1 has 729 objects (7ef7a0bbe3c0->7ef7a0bbfa88) generation 2 has 0 objects (7ef7a0bbe3c0->7ef7a0bbe3c0) Ready for finalization 322 objects (7ef7a0bc05c0->7ef7a0bc0fd0) ------------------------------ Heap 5 generation 0 has 10 objects (7eff84129200->7eff84129250) generation 1 has 992 objects (7eff84127300->7eff84129200) generation 2 has 0 objects (7eff84127300->7eff84127300) Ready for finalization 388 objects (7eff84129bd0->7eff8412a7f0) ------------------------------ Heap 6 generation 0 has 5 objects (7ef694166578->7ef6941665a0) generation 1 has 1,511 objects (7ef694163640->7ef694166578) generation 2 has 0 objects (7ef694163640->7ef694163640) Ready for finalization 339 objects (7ef694166fa8->7ef694167a40) ------------------------------ Heap 7 generation 0 has 2 objects (7ef65c27acc0->7ef65c27acd0) generation 1 has 1,692 objects (7ef65c2777e0->7ef65c27acc0) generation 2 has 0 objects (7ef65c2777e0->7ef65c2777e0) Ready for finalization 316 objects (7ef65c27b690->7ef65c27c070) ------------------------------ Heap 8 generation 0 has 5 objects (7ef79c180c20->7ef79c180c48) generation 1 has 594 objects (7ef79c17f990->7ef79c180c20) generation 2 has 0 objects (7ef79c17f990->7ef79c17f990) Ready for finalization 376 objects (7ef79c1818f0->7ef79c1824b0) ------------------------------ Heap 9 generation 0 has 7 objects (7ef770377290->7ef7703772c8) generation 1 has 832 objects (7ef770375890->7ef770377290) generation 2 has 0 objects (7ef770375890->7ef770375890) Ready for finalization 388 objects (7ef770377bd0->7ef7703787f0) ------------------------------ Heap 10 generation 0 has 10 objects (7ef79c245ce0->7ef79c245d30) generation 1 has 804 objects (7ef79c2443c0->7ef79c245ce0) generation 2 has 0 objects (7ef79c2443c0->7ef79c2443c0) Ready for finalization 390 objects (7ef79c246408->7ef79c247038) ------------------------------ Heap 11 generation 0 has 8 objects (7ef6881c80c8->7ef6881c8108) generation 1 has 1,297 objects (7ef6881c5840->7ef6881c80c8) generation 2 has 0 objects (7ef6881c5840->7ef6881c5840) Ready for finalization 327 objects (7ef6881c8b78->7ef6881c95b0) ------------------------------ Heap 12 generation 0 has 10 objects (7ef79c2af180->7ef79c2af1d0) generation 1 has 2,222 objects (7ef79c2aac10->7ef79c2af180) generation 2 has 0 objects (7ef79c2aac10->7ef79c2aac10) Ready for finalization 485 objects (7ef79c2afc38->7ef79c2b0b60) ------------------------------ Heap 13 generation 0 has 13 objects (7ef74815af60->7ef74815afc8) generation 1 has 350 objects (7ef74815a470->7ef74815af60) generation 2 has 0 objects (7ef74815a470->7ef74815a470) Ready for finalization 329 objects (7ef74815b948->7ef74815c390) ------------------------------ Heap 14 generation 0 has 8 objects (7ef79c79e070->7ef79c79e0b0) generation 1 has 322 objects (7ef79c79d660->7ef79c79e070) generation 2 has 0 objects (7ef79c79d660->7ef79c79d660) Ready for finalization 25,489 objects (7ef79c79e9b8->7ef79c7d0640) ------------------------------ Heap 15 generation 0 has 7 objects (7ef6802dadd0->7ef6802dae08) generation 1 has 508 objects (7ef6802d9df0->7ef6802dadd0) generation 2 has 0 objects (7ef6802d9df0->7ef6802d9df0) Ready for finalization 404,766 objects (7ef6802db588->7ef6805f1e78) ------------------------------ Statistics for all finalizable objects (including all objects ready for finalization): Statistics: MT Count TotalSize Class Name [...] 7eff8de90098 538 17,216 Microsoft.Win32.SafeHandles.SafeEvpMdCtxHandle 7eff917fa6f8 70 18,480 Microsoft.Data.SqlClient.SNI.SNISslStream 7eff90117f28 355 19,880 System.Runtime.CompilerServices.ConditionalWeakTable+Container 7eff9433c1f0 361 20,216 System.IO.Compression.Deflater 7eff8c2c7b88 316 20,224 System.Net.Sockets.SafeSocketHandle 7eff8d8ce2c8 663 21,216 Microsoft.Win32.SafeHandles.SafeX509Handle 7eff8d830920 1,255 30,120 System.Threading.TimerHolder 7eff9433efe8 394 31,520 System.IO.Compression.ZLibNative+ZLibStreamHandle 7eff8c2c5228 316 37,920 System.Net.Sockets.Socket 7eff927f5370 180 44,640 Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver 7eff8d859958 223 62,440 System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs 7eff9146acb8 389 80,912 Microsoft.Data.SqlClient.SqlConnection 7eff9149f348 980 290,080 Microsoft.Data.SqlClient.SqlCommand 7eff89253d70 4,192 301,824 System.Threading.Thread 7eff90b81468 247,708 5,944,992 System.Reflection.Emit.DynamicResolver+DestroyScout 7eff8ad24450 199,413 14,357,736 System..Reflection.Emit.DynamicResolver Total 460,012 objects, 21,441,768 bytes ``` ``` > dumpheap -stat [...] 7eff8acfc688 207,010 9,936,480 System.Reflection.Emit.DynamicMethod+RTDynamicMethod 7eff8ad200a8 207,010 9,936,480 System.Reflection.Emit.ScopeTree 7eff8acfcf68 208,325 9,999,600 System.Reflection.Emit.SignatureHelper 7eff8919ec38 605,088 14,522,112 System.RuntimeTypeHandle 7eff8ad24450 207,001 14,904,072 System.Reflection.Emit.DynamicResolver 7eff89817a20 212,441 20,394,336 System.RuntimeMethodInfoStub 7eff8acfbda0 207,010 23,185,120 System.Reflection.Emit.DynamicMethod 7eff8912a508 358,123 25,959,536 System.Object[] 7eff8acfb6e8 207,010 31,465,520 System.Reflection.Emit.DynamicILGenerator 7eff89251038 1,241,746 120,095,736 System.String 7eff89773338 1,070,259 136,469,834 System.Byte[] 7eff892b7aa8 90,512 260,066,802 System.Char[] 55fb5eb4e900 55,769 364,019,016 Free Total 10,141,843 objects, 1,321,079,065 bytes ``` ``` [System.Runtime] % Time in GC since last GC (%) 5 Allocation Rate (B / 1 sec) 6.2155e+08 CPU Usage (%) 57.81 Exception Count (Count / 1 sec) 94 GC Committed Bytes (MB) 1,675.366 GC Fragmentation (%) 44.356 GC Heap Size (MB) 637.815 Gen 0 GC Count (Count / 1 sec) 2 Gen 0 Size (B) 4,236,872 Gen 1 GC Count (Count / 1 sec) 1 Gen 1 Size (B) 11,126,616 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 5.2245e+08 IL Bytes Jitted (B) 3.1923e+08 LOH Size (B) 5.7157e+08 Monitor Lock Contention Count (Count / 1 sec) 1,212 Number of Active Timers 46 Number of Assemblies Loaded 252 Number of Methods Jitted 9,647,443 POH (Pinned Object Heap) Size (B) 4,465,760 ThreadPool Completed Work Item Count (Count / 1 sec) 23,982 ```
Author: vtortola
Assignees: -
Labels: `tenet-performance`, `area-GC-coreclr`, `untriaged`, `needs-area-label`
Milestone: -
vtortola commented 1 year ago

What is the full stacktrace from k command? Can you find the handle of the method being finalized from it?

I am using dotnet-dump to analyze the memory dump, can that be found out with it?

The most likely explanation of the symptoms is that each request produces one or more DynamicMethods and the finalizer thread is not able to clean them up fast enough.

Note this happens due a traffic spike that lasts just a few minutes. After that most of servers memory levels go back to normal when traffic volume gets back to normal, it is just a few of them that memory keeps growing and latency is getting worse despite of being receiving the same traffic than the rest of servers.

Try to find out what is producing these dynamic methods. The dynamic methods are typically meant to be cached, and it is not unusual to a have a bug where it is not happening.

Tracking those references using gcroot it seems to be mostly Entity Framework and System.Text.Json. We do not do much reflection by ourselves.

The problem can be magnified by enable tracing. Tracing makes the cleanup of DynamicMethods more expensive and so it makes it more likely that the finalizer thread won't be able to keep up with high DynamicMethod churn.

We do not have tracing enabled.

jkotas commented 1 year ago

I am using dotnet-dump to analyze the memory dump, can that be found out with it?

dotnet-dump is not able to printnative stacktraces. The dump has to be opened in native debugger for that. https://learn.microsoft.com/en-us/dotnet/core/diagnostics/debug-linux-dumps or https://learn.microsoft.com/en-us/troubleshoot/developer/webapps/aspnetcore/practice-troubleshoot-linux/lab-1-2-analyze-core-dumps-lldb-debugger have instructions for how to do that.

it seems to be mostly Entity Framework and System.Text.Json. We do not do much reflection by ourselves.

The way you are calling into Entity Framework and System.Text.Json may be creating too many dynamic methods that leads to the problem. For example, are you reusing JsonSerializationOptions instances - https://learn.microsoft.com/dotnet/standard/serialization/system-text-json/configure-options#reuse-jsonserializeroptions-instances ?

Ideally, you should not see any dynamic methods created and destroyed per request in steady state.

We do not have tracing enabled.

The monitoring software may be enabling tracing for you. How are you producing the graph with GC utilization you have shared above? It is mostly likely done via tracing.

ghost commented 1 year ago

This issue has been marked needs-author-action and may be missing some important information.

vtortola commented 1 year ago

Thank you. I will try getting than information from the dump.

Yes, we cache the JsonSerializationOptions.

We get that GC metric ftom Event Counters.

vtortola commented 1 year ago

I have LLDB installed now. I can see the backtrace of the thread in question:

(lldb) setthread 22

(lldb) clrstack
OS Thread Id: 0x1b (22)
        Child SP               IP Call Site
00007F71C8784930 00007f71cda917b2 [InlinedCallFrame: 00007f71c8784930] 
00007F71C8784930 00007f714dfa6957 [InlinedCallFrame: 00007f71c8784930] 
00007F71C8784920 00007F714DFA6957 System.RuntimeMethodHandle.Destroy(System.RuntimeMethodHandleInternal)
00007F71C87849C0 00007F715748F237 System.Reflection.Emit.DynamicResolver+DestroyScout.Finalize() [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicILGenerator.cs @ 669]
00007F71C8784D30 00007f71cd254af6 [DebuggerU2MCatchHandlerFrame: 00007f71c8784d30] 

(lldb) thread backtrace
* thread dotnet/runtime#22, stop reason = signal 0
  * frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
    frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
    frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
    frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
    frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
    frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
    frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
    frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
    frame dotnet/runtime#8: 0x00007f71cd38cf3f libcoreclr.so`EEJitManager::Unload(this=0x00007f2ec80d9340, pAllocator=0x00007f71cd473eb8) at codeman.cpp:3600:21
    frame dotnet/runtime#9: 0x00007f71ccf9e22b libcoreclr.so`LCGMethodResolver::Destroy(this=0x00007f7168112010) at dynamicmethod.cpp:1022:31
    frame dotnet/runtime#10: 0x00007f71ccf9dff7 libcoreclr.so`DynamicMethodDesc::Destroy(this=0x00007f71680ff738) at dynamicmethod.cpp:893:29
    frame dotnet/runtime#11: 0x00007f71cd154206 libcoreclr.so`RuntimeMethodHandle::GetMethodBody(pMethodUNSAFE=<unavailable>, pDeclaringTypeUNSAFE=<unavailable>) at runtimehandles.cpp:2370:9
    frame dotnet/runtime#12: 0x00007f714dfa6964
    frame dotnet/runtime#13: 0x00007f715748f237
    frame dotnet/runtime#14: 0x00007f71cd254af6 libcoreclr.so`VarargPInvokeGenILStub at unixasmmacrosamd64.inc:871
    frame dotnet/runtime#15: 0x00007f71cd00163c libcoreclr.so`MethodTable::CallFinalizer(Object*) at methodtable.cpp:4050:5
    frame dotnet/runtime#16: 0x00007f71cd0015dc libcoreclr.so`MethodTable::CallFinalizer(obj=0x00007f715748f200) at methodtable.cpp:4168
    frame dotnet/runtime#17: 0x00007f71cd0c9f32 libcoreclr.so`ScanTailCallArgBufferRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) [inlined] GCRefMapDecoder::GetBit(this=<unavailable>) at gcrefmap.h:170:30
    frame dotnet/runtime#18: 0x00007f71cd0c9f1e libcoreclr.so`ScanTailCallArgBufferRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) [inlined] GCRefMapDecoder::GetInt() at gcrefmap.h:190
    frame dotnet/runtime#19: 0x00007f71cd0c9ecb libcoreclr.so`ScanTailCallArgBufferRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) at gcrefmap.h:231
    frame dotnet/runtime#20: 0x00007f71cd0c9e74 libcoreclr.so`ScanTailCallArgBufferRoots(pThread=<unavailable>, fn=(libcoreclr.so`g_szFailFastBuffer + 440), sc=0x00007f71cd475a10)(Object**, ScanContext*, unsigned int), ScanContext*) at gcenv.ee.cpp:210
    frame dotnet/runtime#21: 0x00007f71cd0ca155 libcoreclr.so`GCToEEInterface::SyncBlockCacheDemote(max_gen=-850937008) at gcenv.ee.cpp:371:42
    frame dotnet/runtime#22: 0x00007f71cd04c39a libcoreclr.so at threads.cpp:0
    frame dotnet/runtime#23: 0x00007f71cd04ca3d libcoreclr.so`NativeExceptionHolder<ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(PAL_SEHException&)>::InvokeFilter(PAL_SEHException&) [inlined] ManagedThreadBase_DispatchOuter(this=0x0000000000000001, ex=0x00007f71cd47b960)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(PAL_SEHException&)::operator()(PAL_SEHException&) const at threads.cpp:7502:9
    frame dotnet/runtime#24: 0x00007f71cd04c9f2 libcoreclr.so`NativeExceptionHolder<ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(PAL_SEHException&)>::InvokeFilter(this=<unavailable>, ex=0x00007f71cd47b960) at pal.h:4902
    frame dotnet/runtime#25: 0x00007f71cd0ca3f8 libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() at profilepriv.h:193:37
    frame dotnet/runtime#26: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] void ProfControlBlock::IterateProfilers<void (*)(ProfilerInfo*, int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*), int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*>(this=0x00007f71c8784e00, callbackType=ActiveOrInitializing)(ProfilerInfo*, int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*), int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*) at profilepriv.h:207
    frame dotnet/runtime#27: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] int ProfControlBlock::DoProfilerCallback<int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*>(this=0x00007f71c8784e00, callbackType=ActiveOrInitializing)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*) at profilepriv.h:295
    frame dotnet/runtime#28: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] int AnyProfilerPassesCondition<int (*)(ProfilerInfo*)>(int (*)(ProfilerInfo*)) at profilepriv.inl:101
    frame dotnet/runtime#29: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] ProfControlBlock::IsCallback5Supported(this=0x00007f71c8784e00) at profilepriv.inl:263
    frame dotnet/runtime#30: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] CORProfilerTrackConditionalWeakTableElements() at profilepriv.inl:1980
    frame dotnet/runtime#31: 0x00007f71cd0ca3cc libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() at gcenv.ee.cpp:548
    frame dotnet/runtime#32: 0x00007f71cd3e1cee libcoreclr.so`::ExitThread(dwExitCode=<unavailable>) at thread.cpp:826:5
    frame dotnet/runtime#33: 0x00007f71cda8aea7 libpthread.so.0`start_thread + 215
    frame dotnet/runtime#34: 0x00007f71cd679a2f libc.so.6`__clone + 63

Unfortunately I only have this dump. I will force the situation again in the next few days and take two of them.

From the CLR thread stacks, I see 108 threads which top stackframes are :

System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
System.Reflection.Emit.DynamicMethod.CreateDelegate(System.Type, System.Object) [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicMethod.cs @ 370]
System.Linq.Expressions.Compiler.LambdaCompiler.Compile(System.Linq.Expressions.LambdaExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/Compiler/LambdaCompiler.cs @ 190]
00007F3A4EFFA620 00007F7D2FC45A31 System.Linq.Expressions.Expression`1[[System.__Canon, System.Private.CoreLib]].Compile() [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/LambdaExpression.cs @ 221]
Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.GetValue(System.Linq.Expressions.Expression, System.String ByRef)

When comparing with a healthy application dump, only one thread was using CreateDelegate. If that CompileMethod is being slow for some reason may explain why the whole application becomes slow.

Checking some of those thread's backtrace I see they are also blocked on pthread_cond_wait:

(lldb) thread backtrace
* thread dotnet/runtime#76, stop reason = signal 0
  * frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
    frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
    frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
    frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
    frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
    frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
    frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
    frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
    frame dotnet/runtime#8: 0x00007f71cd38dcc3 libcoreclr.so`EEJitManager::LazyGetFunctionEntry(this=<unavailable>, pCodeInfo=<unavailable>) at codeman.cpp:4279:82
(lldb) thread backtrace
* thread dotnet/runtime#77, stop reason = signal 0
  * frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
    frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
    frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
    frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
    frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
    frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
    frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
    frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
    frame dotnet/runtime#8: 0x00007f71cd38dcc3 libcoreclr.so`EEJitManager::LazyGetFunctionEntry(this=<unavailable>, pCodeInfo=<unavailable>) at codeman.cpp:4279:82
(lldb) thread backtrace
* thread dotnet/runtime#78, stop reason = signal 0
  * frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
    frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
    frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
    frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
    frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
    frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
    frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
    frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
    frame dotnet/runtime#8: 0x00007f71cd38c70c libcoreclr.so`EEJitManager::allocEHInfoRaw(_hpCodeHdr*, unsigned int, unsigned long*) [inlined] MethodDesc::GetClassification(this=0x00007f715d8e7f90) const at method.hpp:1726:17
    frame dotnet/runtime#9: 0x00007f71cd38c70b libcoreclr.so`EEJitManager::allocEHInfoRaw(_hpCodeHdr*, unsigned int, unsigned long*) [inlined] MethodDesc::IsLCGMethod(this=0x00007f715d8e7f90) at method.inl:99
    frame dotnet/runtime#10: 0x00007f71cd38c70b libcoreclr.so`EEJitManager::allocEHInfoRaw(this=<unavailable>, pCodeHeader=<unavailable>, blockSize=2977402920, pAllocationSize=0x00007f2f967f6518) at codeman.cpp:3265
[...]
jkotas commented 1 year ago

I see 108 threads which top stackframes are :

System.Linq.Expressions.Expression`1[[System.__Canon, System.Private.CoreLib]].Compile() [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/LambdaExpression.cs @ 221]
Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.GetValue(System.Linq.Expressions.Expression, System.String ByRef)

This is the problem. The EFCore is generating a dynamic method per requiests that is overwhelming the system.

It should be fixed in current EF version by https://github.com/dotnet/efcore/pull/29815 . cc @roji

vtortola commented 1 year ago

Thank you @jkotas

For the record we use Entity Framework 7.0.10 on .NET7. It was also happening with Entity Framework 6.0.6 on .NET7.

Our application usually reads a single record from the database using FindAsync, and loads related entities explicitly depending of the usage.

We load related entities both with Where and without:

Although I can see stack traces using CreateDelegate also when calling FindAsync:

00007F3B1EFF9408 00007f7da80147b2 [InlinedCallFrame: 00007f3b1eff9408] System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
00007F3B1EFF9408 00007f7d2fc3b8c3 [InlinedCallFrame: 00007f3b1eff9408] System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
00007F3B1EFF9400 00007F7D2FC3B8C3 System.Reflection.Emit.DynamicMethod.CreateDelegate(System.Type, System.Object) [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicMethod.cs @ 370]
00007F3B1EFF94A0 00007F7D2FC45B8B System.Linq.Expressions.Compiler.LambdaCompiler.Compile(System.Linq.Expressions.LambdaExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/Compiler/LambdaCompiler.cs @ 190]
00007F3B1EFF94D0 00007F7D2FC45A31 System.Linq.Expressions.Expression`1[[System.__Canon, System.Private.CoreLib]].Compile() [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/LambdaExpression.cs @ 221]
00007F3B1EFF94F0 00007F7D3688F156 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.GetValue(System.Linq.Expressions.Expression, System.String ByRef)
00007F3B1EFF9560 00007F7D36899ED5 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Evaluate(System.Linq.Expressions.Expression, Boolean) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 288]
00007F3B1EFF95A0 00007F7D35678B83 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 105]
00007F3B1EFF95D0 00007F7D2FC4AFE6 System.Linq.Expressions.ExpressionVisitor.VisitBinary(System.Linq.Expressions.BinaryExpression)
00007F3B1EFF9610 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF9640 00007F7D35679673 System.Linq.Expressions.ExpressionVisitor.VisitLambda[[System.__Canon, System.Private.CoreLib]](System.Linq.Expressions.Expression`1<System.__Canon>) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/ExpressionVisitor.cs @ 346]
00007F3B1EFF9670 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF96A0 00007F7D3566CC99 System.Linq.Expressions.ExpressionVisitor.VisitUnary(System.Linq.Expressions.UnaryExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/ExpressionVisitor.cs @ 540]
00007F3B1EFF96C0 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF96F0 00007F7D319DFF4E System.Dynamic.Utils.ExpressionVisitorUtils.VisitArguments(System.Linq.Expressions.ExpressionVisitor, System.Linq.Expressions.IArgumentProvider) [/_/src/libraries/System.Linq.Expressions/src/System/Dynamic/Utils/ExpressionVisitorUtils.cs @ 66]
00007F3B1EFF9740 00007F7D319E1303 System.Linq.Expressions.ExpressionVisitor.VisitMethodCall(System.Linq.Expressions.MethodCallExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/ExpressionVisitor.cs @ 406]
00007F3B1EFF9780 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF97B0 00007F7D368AB1C6 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.ExtractParameters(System.Linq.Expressions.Expression, Boolean) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 75]
00007F3B1EFF9800 00007F7D368AAE98 Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.ExecuteAsync[[System.__Canon, System.Private.CoreLib]](System.Linq.Expressions.Expression, System.Threading.CancellationToken) [/_/src/EFCore/Query/Internal/QueryCompiler.cs @ 109]
00007F3B1EFF9860 00007F7D369381CA Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ExecuteAsync[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Reflection.MethodInfo, System.Linq.IQueryable`1<System.__Canon>, System.Linq.Expressions.Expression, System.Threading.CancellationToken) [/_/src/EFCore/Extensions/EntityFrameworkQueryableExtensions.cs @ 3081]
00007F3B1EFF98E0 00007F7D37A49E21 Microsoft.EntityFrameworkCore.Internal.EntityFinder`1[[System.__Canon, System.Private.CoreLib]].FindAsync(System.Object[], System.Threading.CancellationToken) [/_/src/EFCore/Internal/EntityFinder.cs @ 88]
00007F3B1EFF9960 00007F7D37A43D30 Microsoft.EntityFrameworkCore.Internal.InternalDbSet`1[[System.__Canon, System.Private.CoreLib]].FindAsync(System.Object[]) [/_/src/EFCore/Internal/InternalDbSet.cs @ 170]
[...]

The application were these stackframes were taken was running for days without problem until due a sudden traffic spike that made the CPU hit 85%, some servers included this one started to have memory issues and sluggishness.

roji commented 1 year ago

@jkotas thanks for digging into this... We should backport that specific fix to 7.0 and 6.0.

vtortola commented 1 year ago

@roji in which version is this fixed?

roji commented 1 year ago

@vtortola 8.0.0-rc.1 no longer has this problem - can you please give that a try and confirm that it resolves the problem for you? Note that 8.0 (non-preview/rc) will be out in November.

vtortola commented 1 year ago

@roji I am sorry I cannot deploy our application as .NET8 in our production system, that is where the problem happens. I am afraid I will need to wait till November. If you backport it to EF7 in .NET7 let me know and I can give it a try.

Thanks.

roji commented 1 year ago

@vtortola thanks. I'll submit a PR for a patching 6 and 7, keep your eyes on this issue to be updated on progress.

vtortola commented 1 year ago

@jkotas although maybe it is true that Entity Framework is calling too much to System.Reflection.Emit.DynamicMethod.CreateDelegate and @roji 's fix will help, it still does not explain why the application does not recover from the traffic spike even when traffic goes back to normal levels, and gets worse as time goes by. It is like whatever is doing System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod (called from request processing) in common with System.RuntimeMethodHandle.Destroy (called by the finalizer thread) gets somehow deteriorated and becomes more and more sluggish.

jkotas commented 1 year ago

Yes, I agree that we do not a full explanation of the behavior. System.RuntimeMethodHandle.Destroy can be slower when there is a lot of active DynamicMethods.

roji commented 1 year ago

FYI #31784 has been approved for patching for 6.0 and 7.0.

vtortola commented 1 year ago

Any idea in which 7.0 version will it be released?

ajcvickers commented 1 year ago

Looks like it should be in 7.0.12.

stevendarby commented 11 months ago

Looks like it should be in 7.0.12.

Is this still the case? Could only find the 6.0 PR

ajcvickers commented 11 months ago

@stevendarby It will get merged from 6 into 7.

vtortola commented 11 months ago

@ajcvickers @roji I see 7.0.12 is out, can you please confirm this fix in it? thanks!

ajcvickers commented 11 months ago

@vtortola All non-security fixes got pulled from 7.0.12, so it will be in 7.0.13 instead.

vtortola commented 11 months ago

Alright, we will wait for 7.0.13, many thanks!

vtortola commented 10 months ago

@ajcvickers @roji hi again! I see 7.0.13 is out, can you please confirm this fix in it? thanks!

vtortola commented 10 months ago

I can see the commit 28c0abe0ea0c0444fad2eec526029bfc48542810 in https://github.com/dotnet/efcore/compare/v7.0.13...main 🎊 I will let you know how it goes