Closed vtortola closed 11 months ago
If I take a dump 30 minutes later, it still shows this frame with this same data in this thread.
This fragment does not show the handle of the method being finalized. It is not a proof that the finalizer is stuck finalizing the same dynamic method. What is the full stacktrace from k
command? Can you find the handle of the method being finalized from it?
The most likely explanation of the symptoms is that each request produces one or more DynamicMethods and the finalizer thread is not able to clean them up fast enough. Try to find out what is producing these dynamic methods. The dynamic methods are typically meant to be cached, and it is not unusual to a have a bug where it is not happening.
The problem can be magnified by enable tracing. Tracing makes the cleanup of DynamicMethods more expensive and so it makes it more likely that the finalizer thread won't be able to keep up with high DynamicMethod churn.
Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.
Author: | vtortola |
---|---|
Assignees: | - |
Labels: | `tenet-performance`, `area-GC-coreclr`, `untriaged`, `needs-area-label` |
Milestone: | - |
What is the full stacktrace from k command? Can you find the handle of the method being finalized from it?
I am using dotnet-dump to analyze the memory dump, can that be found out with it?
The most likely explanation of the symptoms is that each request produces one or more DynamicMethods and the finalizer thread is not able to clean them up fast enough.
Note this happens due a traffic spike that lasts just a few minutes. After that most of servers memory levels go back to normal when traffic volume gets back to normal, it is just a few of them that memory keeps growing and latency is getting worse despite of being receiving the same traffic than the rest of servers.
Try to find out what is producing these dynamic methods. The dynamic methods are typically meant to be cached, and it is not unusual to a have a bug where it is not happening.
Tracking those references using gcroot
it seems to be mostly Entity Framework and System.Text.Json. We do not do much reflection by ourselves.
The problem can be magnified by enable tracing. Tracing makes the cleanup of DynamicMethods more expensive and so it makes it more likely that the finalizer thread won't be able to keep up with high DynamicMethod churn.
We do not have tracing enabled.
I am using dotnet-dump to analyze the memory dump, can that be found out with it?
dotnet-dump is not able to printnative stacktraces. The dump has to be opened in native debugger for that. https://learn.microsoft.com/en-us/dotnet/core/diagnostics/debug-linux-dumps or https://learn.microsoft.com/en-us/troubleshoot/developer/webapps/aspnetcore/practice-troubleshoot-linux/lab-1-2-analyze-core-dumps-lldb-debugger have instructions for how to do that.
it seems to be mostly Entity Framework and System.Text.Json. We do not do much reflection by ourselves.
The way you are calling into Entity Framework and System.Text.Json may be creating too many dynamic methods that leads to the problem. For example, are you reusing JsonSerializationOptions instances - https://learn.microsoft.com/dotnet/standard/serialization/system-text-json/configure-options#reuse-jsonserializeroptions-instances ?
Ideally, you should not see any dynamic methods created and destroyed per request in steady state.
We do not have tracing enabled.
The monitoring software may be enabling tracing for you. How are you producing the graph with GC utilization you have shared above? It is mostly likely done via tracing.
This issue has been marked needs-author-action
and may be missing some important information.
Thank you. I will try getting than information from the dump.
Yes, we cache the JsonSerializationOptions.
We get that GC metric ftom Event Counters.
I have LLDB installed now. I can see the backtrace of the thread in question:
(lldb) setthread 22
(lldb) clrstack
OS Thread Id: 0x1b (22)
Child SP IP Call Site
00007F71C8784930 00007f71cda917b2 [InlinedCallFrame: 00007f71c8784930]
00007F71C8784930 00007f714dfa6957 [InlinedCallFrame: 00007f71c8784930]
00007F71C8784920 00007F714DFA6957 System.RuntimeMethodHandle.Destroy(System.RuntimeMethodHandleInternal)
00007F71C87849C0 00007F715748F237 System.Reflection.Emit.DynamicResolver+DestroyScout.Finalize() [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicILGenerator.cs @ 669]
00007F71C8784D30 00007f71cd254af6 [DebuggerU2MCatchHandlerFrame: 00007f71c8784d30]
(lldb) thread backtrace
* thread dotnet/runtime#22, stop reason = signal 0
* frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
frame dotnet/runtime#8: 0x00007f71cd38cf3f libcoreclr.so`EEJitManager::Unload(this=0x00007f2ec80d9340, pAllocator=0x00007f71cd473eb8) at codeman.cpp:3600:21
frame dotnet/runtime#9: 0x00007f71ccf9e22b libcoreclr.so`LCGMethodResolver::Destroy(this=0x00007f7168112010) at dynamicmethod.cpp:1022:31
frame dotnet/runtime#10: 0x00007f71ccf9dff7 libcoreclr.so`DynamicMethodDesc::Destroy(this=0x00007f71680ff738) at dynamicmethod.cpp:893:29
frame dotnet/runtime#11: 0x00007f71cd154206 libcoreclr.so`RuntimeMethodHandle::GetMethodBody(pMethodUNSAFE=<unavailable>, pDeclaringTypeUNSAFE=<unavailable>) at runtimehandles.cpp:2370:9
frame dotnet/runtime#12: 0x00007f714dfa6964
frame dotnet/runtime#13: 0x00007f715748f237
frame dotnet/runtime#14: 0x00007f71cd254af6 libcoreclr.so`VarargPInvokeGenILStub at unixasmmacrosamd64.inc:871
frame dotnet/runtime#15: 0x00007f71cd00163c libcoreclr.so`MethodTable::CallFinalizer(Object*) at methodtable.cpp:4050:5
frame dotnet/runtime#16: 0x00007f71cd0015dc libcoreclr.so`MethodTable::CallFinalizer(obj=0x00007f715748f200) at methodtable.cpp:4168
frame dotnet/runtime#17: 0x00007f71cd0c9f32 libcoreclr.so`ScanTailCallArgBufferRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) [inlined] GCRefMapDecoder::GetBit(this=<unavailable>) at gcrefmap.h:170:30
frame dotnet/runtime#18: 0x00007f71cd0c9f1e libcoreclr.so`ScanTailCallArgBufferRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) [inlined] GCRefMapDecoder::GetInt() at gcrefmap.h:190
frame dotnet/runtime#19: 0x00007f71cd0c9ecb libcoreclr.so`ScanTailCallArgBufferRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) at gcrefmap.h:231
frame dotnet/runtime#20: 0x00007f71cd0c9e74 libcoreclr.so`ScanTailCallArgBufferRoots(pThread=<unavailable>, fn=(libcoreclr.so`g_szFailFastBuffer + 440), sc=0x00007f71cd475a10)(Object**, ScanContext*, unsigned int), ScanContext*) at gcenv.ee.cpp:210
frame dotnet/runtime#21: 0x00007f71cd0ca155 libcoreclr.so`GCToEEInterface::SyncBlockCacheDemote(max_gen=-850937008) at gcenv.ee.cpp:371:42
frame dotnet/runtime#22: 0x00007f71cd04c39a libcoreclr.so at threads.cpp:0
frame dotnet/runtime#23: 0x00007f71cd04ca3d libcoreclr.so`NativeExceptionHolder<ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(PAL_SEHException&)>::InvokeFilter(PAL_SEHException&) [inlined] ManagedThreadBase_DispatchOuter(this=0x0000000000000001, ex=0x00007f71cd47b960)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(PAL_SEHException&)::operator()(PAL_SEHException&) const at threads.cpp:7502:9
frame dotnet/runtime#24: 0x00007f71cd04c9f2 libcoreclr.so`NativeExceptionHolder<ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(PAL_SEHException&)>::InvokeFilter(this=<unavailable>, ex=0x00007f71cd47b960) at pal.h:4902
frame dotnet/runtime#25: 0x00007f71cd0ca3f8 libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() at profilepriv.h:193:37
frame dotnet/runtime#26: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] void ProfControlBlock::IterateProfilers<void (*)(ProfilerInfo*, int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*), int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*>(this=0x00007f71c8784e00, callbackType=ActiveOrInitializing)(ProfilerInfo*, int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*), int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*) at profilepriv.h:207
frame dotnet/runtime#27: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] int ProfControlBlock::DoProfilerCallback<int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*>(this=0x00007f71c8784e00, callbackType=ActiveOrInitializing)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*) at profilepriv.h:295
frame dotnet/runtime#28: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] int AnyProfilerPassesCondition<int (*)(ProfilerInfo*)>(int (*)(ProfilerInfo*)) at profilepriv.inl:101
frame dotnet/runtime#29: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] ProfControlBlock::IsCallback5Supported(this=0x00007f71c8784e00) at profilepriv.inl:263
frame dotnet/runtime#30: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] CORProfilerTrackConditionalWeakTableElements() at profilepriv.inl:1980
frame dotnet/runtime#31: 0x00007f71cd0ca3cc libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() at gcenv.ee.cpp:548
frame dotnet/runtime#32: 0x00007f71cd3e1cee libcoreclr.so`::ExitThread(dwExitCode=<unavailable>) at thread.cpp:826:5
frame dotnet/runtime#33: 0x00007f71cda8aea7 libpthread.so.0`start_thread + 215
frame dotnet/runtime#34: 0x00007f71cd679a2f libc.so.6`__clone + 63
Unfortunately I only have this dump. I will force the situation again in the next few days and take two of them.
From the CLR thread stacks, I see 108 threads which top stackframes are :
System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
System.Reflection.Emit.DynamicMethod.CreateDelegate(System.Type, System.Object) [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicMethod.cs @ 370]
System.Linq.Expressions.Compiler.LambdaCompiler.Compile(System.Linq.Expressions.LambdaExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/Compiler/LambdaCompiler.cs @ 190]
00007F3A4EFFA620 00007F7D2FC45A31 System.Linq.Expressions.Expression`1[[System.__Canon, System.Private.CoreLib]].Compile() [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/LambdaExpression.cs @ 221]
Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.GetValue(System.Linq.Expressions.Expression, System.String ByRef)
When comparing with a healthy application dump, only one thread was using CreateDelegate
.
If that CompileMethod
is being slow for some reason may explain why the whole application becomes slow.
Checking some of those thread's backtrace I see they are also blocked on pthread_cond_wait
:
(lldb) thread backtrace
* thread dotnet/runtime#76, stop reason = signal 0
* frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
frame dotnet/runtime#8: 0x00007f71cd38dcc3 libcoreclr.so`EEJitManager::LazyGetFunctionEntry(this=<unavailable>, pCodeInfo=<unavailable>) at codeman.cpp:4279:82
(lldb) thread backtrace
* thread dotnet/runtime#77, stop reason = signal 0
* frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
frame dotnet/runtime#8: 0x00007f71cd38dcc3 libcoreclr.so`EEJitManager::LazyGetFunctionEntry(this=<unavailable>, pCodeInfo=<unavailable>) at codeman.cpp:4279:82
(lldb) thread backtrace
* thread dotnet/runtime#78, stop reason = signal 0
* frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
frame dotnet/runtime#8: 0x00007f71cd38c70c libcoreclr.so`EEJitManager::allocEHInfoRaw(_hpCodeHdr*, unsigned int, unsigned long*) [inlined] MethodDesc::GetClassification(this=0x00007f715d8e7f90) const at method.hpp:1726:17
frame dotnet/runtime#9: 0x00007f71cd38c70b libcoreclr.so`EEJitManager::allocEHInfoRaw(_hpCodeHdr*, unsigned int, unsigned long*) [inlined] MethodDesc::IsLCGMethod(this=0x00007f715d8e7f90) at method.inl:99
frame dotnet/runtime#10: 0x00007f71cd38c70b libcoreclr.so`EEJitManager::allocEHInfoRaw(this=<unavailable>, pCodeHeader=<unavailable>, blockSize=2977402920, pAllocationSize=0x00007f2f967f6518) at codeman.cpp:3265
[...]
I see 108 threads which top stackframes are :
System.Linq.Expressions.Expression`1[[System.__Canon, System.Private.CoreLib]].Compile() [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/LambdaExpression.cs @ 221] Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.GetValue(System.Linq.Expressions.Expression, System.String ByRef)
This is the problem. The EFCore is generating a dynamic method per requiests that is overwhelming the system.
It should be fixed in current EF version by https://github.com/dotnet/efcore/pull/29815 . cc @roji
Thank you @jkotas
For the record we use Entity Framework 7.0.10 on .NET7. It was also happening with Entity Framework 6.0.6 on .NET7.
Our application usually reads a single record from the database using FindAsync
, and loads related entities explicitly depending of the usage.
We load related entities both with Where
and without:
await Entry(entity).Collection(u => u.Related).Query().LoadAsync(cancel);
await Entry(entity).Collection(u => u.Related).Query().Where(x => x.Member == "member").LoadAsync(cancel);
Although I can see stack traces using CreateDelegate
also when calling FindAsync
:
00007F3B1EFF9408 00007f7da80147b2 [InlinedCallFrame: 00007f3b1eff9408] System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
00007F3B1EFF9408 00007f7d2fc3b8c3 [InlinedCallFrame: 00007f3b1eff9408] System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
00007F3B1EFF9400 00007F7D2FC3B8C3 System.Reflection.Emit.DynamicMethod.CreateDelegate(System.Type, System.Object) [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicMethod.cs @ 370]
00007F3B1EFF94A0 00007F7D2FC45B8B System.Linq.Expressions.Compiler.LambdaCompiler.Compile(System.Linq.Expressions.LambdaExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/Compiler/LambdaCompiler.cs @ 190]
00007F3B1EFF94D0 00007F7D2FC45A31 System.Linq.Expressions.Expression`1[[System.__Canon, System.Private.CoreLib]].Compile() [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/LambdaExpression.cs @ 221]
00007F3B1EFF94F0 00007F7D3688F156 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.GetValue(System.Linq.Expressions.Expression, System.String ByRef)
00007F3B1EFF9560 00007F7D36899ED5 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Evaluate(System.Linq.Expressions.Expression, Boolean) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 288]
00007F3B1EFF95A0 00007F7D35678B83 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 105]
00007F3B1EFF95D0 00007F7D2FC4AFE6 System.Linq.Expressions.ExpressionVisitor.VisitBinary(System.Linq.Expressions.BinaryExpression)
00007F3B1EFF9610 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF9640 00007F7D35679673 System.Linq.Expressions.ExpressionVisitor.VisitLambda[[System.__Canon, System.Private.CoreLib]](System.Linq.Expressions.Expression`1<System.__Canon>) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/ExpressionVisitor.cs @ 346]
00007F3B1EFF9670 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF96A0 00007F7D3566CC99 System.Linq.Expressions.ExpressionVisitor.VisitUnary(System.Linq.Expressions.UnaryExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/ExpressionVisitor.cs @ 540]
00007F3B1EFF96C0 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF96F0 00007F7D319DFF4E System.Dynamic.Utils.ExpressionVisitorUtils.VisitArguments(System.Linq.Expressions.ExpressionVisitor, System.Linq.Expressions.IArgumentProvider) [/_/src/libraries/System.Linq.Expressions/src/System/Dynamic/Utils/ExpressionVisitorUtils.cs @ 66]
00007F3B1EFF9740 00007F7D319E1303 System.Linq.Expressions.ExpressionVisitor.VisitMethodCall(System.Linq.Expressions.MethodCallExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/ExpressionVisitor.cs @ 406]
00007F3B1EFF9780 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF97B0 00007F7D368AB1C6 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.ExtractParameters(System.Linq.Expressions.Expression, Boolean) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 75]
00007F3B1EFF9800 00007F7D368AAE98 Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.ExecuteAsync[[System.__Canon, System.Private.CoreLib]](System.Linq.Expressions.Expression, System.Threading.CancellationToken) [/_/src/EFCore/Query/Internal/QueryCompiler.cs @ 109]
00007F3B1EFF9860 00007F7D369381CA Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ExecuteAsync[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Reflection.MethodInfo, System.Linq.IQueryable`1<System.__Canon>, System.Linq.Expressions.Expression, System.Threading.CancellationToken) [/_/src/EFCore/Extensions/EntityFrameworkQueryableExtensions.cs @ 3081]
00007F3B1EFF98E0 00007F7D37A49E21 Microsoft.EntityFrameworkCore.Internal.EntityFinder`1[[System.__Canon, System.Private.CoreLib]].FindAsync(System.Object[], System.Threading.CancellationToken) [/_/src/EFCore/Internal/EntityFinder.cs @ 88]
00007F3B1EFF9960 00007F7D37A43D30 Microsoft.EntityFrameworkCore.Internal.InternalDbSet`1[[System.__Canon, System.Private.CoreLib]].FindAsync(System.Object[]) [/_/src/EFCore/Internal/InternalDbSet.cs @ 170]
[...]
The application were these stackframes were taken was running for days without problem until due a sudden traffic spike that made the CPU hit 85%, some servers included this one started to have memory issues and sluggishness.
@jkotas thanks for digging into this... We should backport that specific fix to 7.0 and 6.0.
@roji in which version is this fixed?
@vtortola 8.0.0-rc.1 no longer has this problem - can you please give that a try and confirm that it resolves the problem for you? Note that 8.0 (non-preview/rc) will be out in November.
@roji I am sorry I cannot deploy our application as .NET8 in our production system, that is where the problem happens. I am afraid I will need to wait till November. If you backport it to EF7 in .NET7 let me know and I can give it a try.
Thanks.
@vtortola thanks. I'll submit a PR for a patching 6 and 7, keep your eyes on this issue to be updated on progress.
@jkotas although maybe it is true that Entity Framework is calling too much to System.Reflection.Emit.DynamicMethod.CreateDelegate
and @roji 's fix will help, it still does not explain why the application does not recover from the traffic spike even when traffic goes back to normal levels, and gets worse as time goes by. It is like whatever is doing System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod
(called from request processing) in common with System.RuntimeMethodHandle.Destroy
(called by the finalizer thread) gets somehow deteriorated and becomes more and more sluggish.
Yes, I agree that we do not a full explanation of the behavior. System.RuntimeMethodHandle.Destroy
can be slower when there is a lot of active DynamicMethods.
FYI #31784 has been approved for patching for 6.0 and 7.0.
Any idea in which 7.0 version will it be released?
Looks like it should be in 7.0.12.
Looks like it should be in 7.0.12.
Is this still the case? Could only find the 6.0 PR
@stevendarby It will get merged from 6 into 7.
@ajcvickers @roji I see 7.0.12 is out, can you please confirm this fix in it? thanks!
@vtortola All non-security fixes got pulled from 7.0.12, so it will be in 7.0.13 instead.
Alright, we will wait for 7.0.13, many thanks!
@ajcvickers @roji hi again! I see 7.0.13 is out, can you please confirm this fix in it? thanks!
I can see the commit 28c0abe0ea0c0444fad2eec526029bfc48542810 in https://github.com/dotnet/efcore/compare/v7.0.13...main 🎊 I will let you know how it goes
Description
When a single application instance reaches ~1700 Rps and ~85% Cpu usage during a short period of time due a traffic spike, around a 25% of our servers experiment what it seems to be a blockage in the finalizer thread and the application starts hoarding memory and increasing latency. Eventually we have to kill the server when memory is at 10X of what it uses normally and latency is not acceptable.
Exploring with dotnet-dump, we see that after the requests spike the finalizer queue starts to accumulate objects. Taking multiple dumps shows that the "Ready for finalization" keep growing in each heap.
When exploring the threads, there is always a thread
0x001B
that is stuck in this frame:If I take a dump 30 minutes later, it still shows this frame with this same data in this thread.
Those servers keep having a higher % of time in GC
Configuration
Server is an Azure Standard_F16s_v2 (16 cores, 32 GiB RAM).
Docker image
mcr.microsoft.com/dotnet/aspnet:7.0.11
.Regression?
We are not completely sure, but we have the feeling it started happening when we moved to .NET7 from .NET6. When we were in .NET6 we did not do see this kind of situation. After we moved to .NET7 we started seeing some machines using an unexpected big amount of memory that starts on a traffic spike situation.
Data