dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.51k stars 4.78k forks source link

Deadlock in System.Diagnostics.Tracing.FrameworkEventSource on application startup #105682

Closed masloboev closed 2 months ago

masloboev commented 4 months ago

Description

Deadlock in System.Diagnostics.Tracing.FrameworkEventSource on “System.Runtime” event pipe provider initialization. Deadlock occurs when counters are requested on application startup between these two objects: EventListener.EventListenersLock monitor lock and System.Diagnostics.Tracing.FrameworkEventSource type access critical section.

Deadlock description

Thread 1:

This can come from any point that checks FrameworkEventSource.Log.IsEnabled. In the attached dump it is checked for ThreadTransfer event.

Thread 2:

But the type FrameworkEventSource is still not initialized.

Full stack traces

Thread 1:

    [Deadlocked, double-click or press enter to view all thread stacks] 
    [Waiting on lock owned by Thread 0x6778, double-click or press enter to switch to thread]   
ntdll.dll!NtWaitForMultipleObjects()
KERNELBASE.dll!WaitForMultipleObjectsEx()
    [Managed to Native Transition]  
System.Private.CoreLib.dll!System.Diagnostics.Tracing.EventSource.Initialize(System.Guid eventSourceGuid, string eventSourceName, string[] traits)
System.Private.CoreLib.dll!System.Diagnostics.Tracing.FrameworkEventSource.FrameworkEventSource()
System.Private.CoreLib.dll!System.Diagnostics.Tracing.FrameworkEventSource.FrameworkEventSource()
    [Native to Managed Transition]  
    [Managed to Native Transition]  
System.Private.CoreLib.dll!System.Threading.TimerQueueTimer.Change(uint dueTime, uint period)
System.Private.CoreLib.dll!System.Threading.Tasks.Task.DelayPromise.DelayPromise(uint millisecondsDelay, System.TimeProvider timeProvider)
System.Private.CoreLib.dll!System.Threading.Tasks.Task.Delay(uint millisecondsDelay, System.TimeProvider timeProvider, System.Threading.CancellationToken cancellationToken)
System.Private.CoreLib.dll!System.Threading.Tasks.Task.Delay(int millisecondsDelay)
DFA.dll!DFA.Program.Main()

Thread 2:

    [Deadlocked, double-click or press enter to view all thread stacks] 
    [Waiting on lock owned by Thread 0x300C, double-click or press enter to switch to thread]   
ntdll.dll!NtWaitForAlertByThreadId()
ntdll.dll!RtlpWaitOnAddressWithTimeout()
ntdll.dll!RtlpWaitOnAddress()
ntdll.dll!RtlpWaitOnCriticalSection()
ntdll.dll!RtlpEnterCriticalSectionContended()
ntdll.dll!RtlEnterCriticalSection()
    [Managed to Native Transition]  
System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.ThreadPoolWorkQueue()
System.Private.CoreLib.dll!System.Threading.ThreadPool.ThreadPool()
    [Native to Managed Transition]  
    [Managed to Native Transition]  
System.Private.CoreLib.dll!System.Threading.ThreadPool.CompletedWorkItemCount.get()  
System.Private.CoreLib.dll!System.Diagnostics.Tracing.RuntimeEventSource.OnEventCommand.AnonymousMethod__35_9()
System.Private.CoreLib.dll!System.Diagnostics.Tracing.IncrementingPollingCounter.UpdateMetric()
System.Private.CoreLib.dll!System.Diagnostics.Tracing.CounterGroup.ResetCounters()
System.Private.CoreLib.dll!System.Diagnostics.Tracing.CounterGroup.EnableTimer(float pollingIntervalInSeconds)
System.Private.CoreLib.dll!System.Diagnostics.Tracing.CounterGroup.OnEventSourceCommand(object sender, System.Diagnostics.Tracing.EventCommandEventArgs e)
System.Private.CoreLib.dll!System.Diagnostics.Tracing.EventSource.DoCommand(System.Diagnostics.Tracing.EventCommandEventArgs commandArgs)
System.Private.CoreLib.dll!System.Diagnostics.Tracing.EventSource.SendCommand(System.Diagnostics.Tracing.EventListener listener, System.Diagnostics.Tracing.EventProviderType eventProviderType, int perEventSourceSessionId, System.Diagnostics.Tracing.EventCommand command, bool enable, System.Diagnostics.Tracing.EventLevel level, System.Diagnostics.Tracing.EventKeywords matchAnyKeyword, System.Collections.Generic.IDictionary<string, string> commandArguments)
System.Private.CoreLib.dll!System.Diagnostics.Tracing.EventSource.OverrideEventProvider.OnControllerCommand(System.Diagnostics.Tracing.ControllerCommand command, System.Collections.Generic.IDictionary<string, string> arguments, int perEventSourceSessionId)
System.Private.CoreLib.dll!System.Diagnostics.Tracing.EventPipeEventProvider.HandleEnableNotification(System.Diagnostics.Tracing.EventProvider target, byte* additionalData, byte level, long matchAnyKeywords, long matchAllKeywords, Interop.Advapi32.EVENT_FILTER_DESCRIPTOR* filterData)
System.Private.CoreLib.dll!System.Diagnostics.Tracing.EventProviderImpl.ProviderCallback(System.Diagnostics.Tracing.EventProvider target, byte* additionalData, int controlCode, byte level, long matchAnyKeywords, long matchAllKeywords, Interop.Advapi32.EVENT_FILTER_DESCRIPTOR* filterData)
System.Private.CoreLib.dll!System.Diagnostics.Tracing.EventPipeEventProvider.Callback(byte* sourceId, int isEnabled, byte level, long matchAnyKeywords, long matchAllKeywords, Interop.Advapi32.EVENT_FILTER_DESCRIPTOR* filterData, void* callbackContext)

Reproduction Steps

Connect EventPipe with enabled "System.Runtime" provider to application on application startup.

Deadlock dump reference: dump

Expected behavior

Event pipe is connected, EventCounters events are sending

Actual behavior

Application hangs, deadlock occured

Regression?

No response

Known Workarounds

No response

Configuration

.NET 8.0.524.21615 (dotnet-sdk-8.0.300-win-x64-d1497a0e) OS Windows 10 x64 no, it is not specific to this configuration

Other information

This issue prevents using “System.Runtime” events provider for .NET Core applications moniotring

Solutions: Maybe postpone or pull out from under the lock counters initialization (System.Diagnostics.Tracing.CounterGroup.ResetCounters) is good solution. Another solution is warm up FrameworkEventSource before counters initialization.

Similar issue: ThreadPoolWorkQueue..ctor randomly will never return

tommcdon commented 4 months ago

@mdh1418

HaraGabi commented 4 months ago

I've run into this problem as well. Is there an acceptable workaround?

So far I've tried manually disposing of the System.Runtime trace source in the Main method like this: var sources = EventSource.GetSources(); var runtime = sources.FirstOrDefault(x => x.Name == "System.Runtime"); runtime?.Dispose();

This worked but I don't know if it will have any unintended consequences.

mdh1418 commented 2 months ago

I believe this is fixed by https://github.com/dotnet/runtime/pull/105548. I was able to repro the issue on .NET Sdk 8.0.300, and using a local build of runtime with the PR I could not trigger the deadlock.