Open colejohnson66 opened 1 month ago
Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.
I'm not sure if I'm doing this right, but dumping the memory at the IP of the "dynamic helper frame" shows repeated 32 byte blocks of memory:
> db 7ff913173b94
00007ff913173b94: c3 cd 2e c3 0f 1f 84 00 00 00 00 00 4c 8b d1 b8 ............L...
00007ff913173ba4: e1 01 00 00 f6 04 25 08 03 fe 7f 01 75 03 0f 05 ......%.....u...
00007ff913173bb4: c3 cd 2e c3 0f 1f 84 00 00 00 00 00 4c 8b d1 b8 ............L...
00007ff913173bc4: e2 01 00 00 f6 04 25 08 03 fe 7f 01 75 03 0f 05 ......%.....u...
00007ff913173bd4: c3 cd 2e c3 0f 1f 84 00 00 00 00 00 4c 8b d1 b8 ............L...
00007ff913173be4: e3 01 00 00 f6 04 25 08 03 fe 7f 01 75 03 0f 05 ......%.....u...
00007ff913173bf4: c3 cd 2e c3 0f 1f 84 00 00 00 00 00 4c 8b d1 b8 ............L...
00007ff913173c04: e4 01 00 00 f6 04 25 08 03 fe 7f 01 75 03 0f 05 ......%.....u...
C3
is RET
, yet it never returns? It looks like the IP is inside ntdll.dll
?
> lm
... trimmed ...
00007FF9130D0000 00217000 C:\Windows\System32\ntdll.dll
Total image size: 72019968
> sos u 7ff913173b94
Unmanaged code
Hey @colejohnson66, are you able to share a repro for this issue?
We unfortunately do not have anything we could share, and I haven’t narrowed anything down. I still have the dump if you’d like anything from it, and I can do some more debugging if necessary.
I wonder if #104457 is related? It being intermittent would be an indication of some race condition.
OS thread 0x6600 has locked EventListener.EventListenersLock, and I think it is waiting for the static constructor of FrameworkEventSource.
OS thread 0xbd4 has entered the static constructor of FrameworkEventSource, and is waiting to lock EventListener.EventListenersLock.
I wonder what caused EventPipeEventProvider.Callback to be called. Had you connected some diagnostic tools to the process before the deadlock occurred?
EventSource.SendCommand has code to enqueue commands if the EventSource hasn't finished initializing yet. Could the deadlock be fixed by making RuntimeEventSource also enqueue commands until FrameworkEventSource finishes initializing?
Alternatively, could the ThreadPool counters implemented in RuntimeEventSource.OnEventCommand return 0 if ThreadPool hasn't been initialized yet? Rather than wait for ThreadPool and FrameworkEventSource to be initialized.
I wonder if https://github.com/dotnet/runtime/issues/104457 is related?
No, that's for TraceSource, and this deadlock is with EventSource.
This only started recently, so maybe Rider changed how something in the backend in the 2024.2 EAPs? I'm not ruling that possibility out.
My reproduction steps are: click the "debug" button, observe the debugger start, but not the GUI, click "pause" to investigate, then see the deadlock. Every time, it's the same stack trace. After a few times of this happening, I used dotnet-dump
instead of pausing, and I got the same stack trace - the one above.
As for program execution, we have an async Task Main()
(giving an async state machine), and the very first thing it does is start our SQLite-based logger. This calls into Microsoft.Data.SQLite
to open the database. That stack trace on OS thread 0xBD4
is the first thing our program executes:
internal static class Program
{
[STAThread]
public static async Task Main(string[] args)
{
Logger.Instance.Start("Logs.db");
I'll see if the callback parameters have any information next time this happens.
I can't access the stack variables in the debugger:
An IL variable is not available at the current native IP. (0x80131304). The error code is CORDBG_E_IL_VAR_NOT_AVAILABLE, or 0x80131304.
What I can see is that the event counter being incremented in threadpool-completed-items-count
. This then runs the static constructor of ThreadPool
, which constructs the ThreadPoolWorkQueue
.
Also, if the Timer
used by SQLiteConnectionFactory
manages to win the race to initialize the thread pool, the program starts fine. It has this stack trace:
new ThreadPoolWorkQueue()
static ThreadPool()
TimerQueue.SetTimer()
TimerQueue.EnsureTimerFiresBy()
TimerQueueTimer.Change()
new TimerQueueTimer()
Timer.TimerSetup()
new Timer()
new SqliteConnectionFactory()
static SqliteConnectionFactory()
SqliteConnection.set_ConnectionString()
new SqliteConnection()
Logger.OpenDatabaseCore()
Logger.OpenDatabaseReadWrite()
Logger.Start()
async Program.Main()
AsyncMethodBuilderCore.Start<{removed}.Program.<Main>d__0>()
Program.Main()
Program.<Main>()
This may be a duplicate of https://github.com/dotnet/runtime/issues/93175.
Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti See info in area-owners.md if you want to be subscribed.
Description
Periodically, I've experienced a deadlock on program start through
ThreadPoolWorkQueue..ctor
. According to the debugger, that constructor is stuck atRefreshLoggingEnabled
, and it never continues.dotnet-dump
provided this information, implying it's stuck in a "dynamic helper frame".Reproduction Steps
Unsure, but when it happens, restarting the program/debug session will usually fix it.
Expected behavior
Program starts the thread pool queue just fine and no deadlock occurs.
Actual behavior
Deadlock due to the lock never being released.
Regression?
No response
Known Workarounds
No response
Configuration
Microsoft.Data.SQLite
Other information
No response