dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.95k stars 4.65k forks source link

GC deadlock/livelock? #66759

Closed filipnavara closed 2 years ago

filipnavara commented 2 years ago

Description

Our process routinely seems to freeze. There are some dispatcher threads which may be interfering with the thread suspension logic but in this particular case the only one seems to be running CoreCLR code.

Reproduction Steps

No idea so far, happens randomly but consistently in first few minutes of the process run.

Expected behavior

No process freeze.

Actual behavior

Process becomes locked and non-responsive.

Regression?

No response

Known Workarounds

No response

Configuration

.NET 6.0.3, Xamarin.Mac Preview 14 (happend on Preview 13 too)

Other information

Thread sample: https://gist.github.com/filipnavara/afdd426069dfa00f18efa5b8508dd34c

filipnavara commented 2 years ago

I'll be away from keyboard for the next couple of hours but I will fix the stress logging when I get back. Not quite sure if it will help though. Anything else that is worth logging/pursuing?

janvorli commented 2 years ago

I think I have enough data for now. No matter what the other threads at unsafe place are, this one is enough to cause the deadlock.

janvorli commented 2 years ago

Since the breakpoint was set in the prolog, it cannot be a GC safe place. The clru with the -gcinfo option agrees with me:

(lldb) clru -gcinfo 0x00000002888f1358
Normal JIT generated code
System.Threading.Thread.StartCallback()
ilAddr is 0000000105854FF4 pImport is 000000010FC18110
Begin 00000002888F1358, size e4
Prolog size: 0
Security object: <none>
GS cookie: <none>
PSPSym: <none>
Generics inst context: <none>
PSP slot: <none>
GenericInst slot: <none>
Varargs: 0
Frame pointer: Fp
Has tailcalls: 1
Size of parameter area: 0
Return Kind: Scalar
Code size: e4
>>> 00000002888f1358 fd7bbea9             stp     x29, x30, [sp, #-0x20]!
00000002888f135c f35301a9             stp     x19, x20, [sp, #0x10]
00000002888f1360 fd030091             mov     x29, sp
0000000c interruptible
0000000c +X0
00000002888f1364 131040f9             ldr     x19, [x0, #0x20]
00000010 +X19
00000002888f1368 1f1000f9             str     xzr, [x0, #0x20]
00000002888f136c 601640f9             ldr     x0, [x19, #0x28]
janvorli commented 2 years ago

@dotnet/dotnet-diag folks, is it expected that the debugger sets a breakpoint at the first instruction of the System.Threading.Thread.StartCallback() method?

jkotas commented 2 years ago

StartCallback is called via CALL_MANAGED_METHOD macro (aka "reverse FCall"). All reverse FCalls give a debugger a chance to place a breakpoint at the managed target here: https://github.com/dotnet/runtime/blob/8e571cd488abf8578c8b3a8463adf99316cb129d/src/coreclr/vm/callhelpers.cpp#L182 . I do not think that the exact reason why debugger placed the breakpoint in this case is interesting. It depends on number of factors, like whether the just-my-code is enabled or not. The runtime is expected to allow it and handle it gracefully.

filipnavara commented 2 years ago

Just-my-code is disabled btw. It was one of the options I changed before it started happening.

jkotas commented 2 years ago

Just-my-code is disabled btw. It was one of the options I changed before it started happening.

The debugger does not place these tracing breakpoints with just-my-code enabled (or does not place them frequently at least). These tracing breakpoints are placed frequently with just-my-code disable. It explains why disabling just-my-code would make this dead-lock show up.

janvorli commented 2 years ago

@jkotas I am trying to reason about what's not expected in this case:

jkotas commented 2 years ago

I think it is the first one (Is it unexpected that the DebuggerController::DispatchPatchOrSingleStep calls Thread::RareDisablePreemptiveGC).

The first method instruction is not a place where the GC can run, so it sounds right to me that the thread is marked as not being at safe place.

janvorli commented 2 years ago

@VSadov you have modified the code in ThreadSuspend::SuspendEE a year ago. I can see that you've put in a comment there saying: https://github.com/dotnet/runtime/blob/798d52beb74db0faff27306d7ed97186333037ca/src/coreclr/vm/threadsuspend.cpp#L5734-L5736

It seems that's what we are hitting here, but not in a rare case. The debugger puts a breakpoint at the first instruction of System.Threading.Thread.StartCallback(), probably to get notified on a new thread. That location is not a GC safe location, so if sending the notification to the debugger on the breakpoint in DebuggerController::DispatchPatchOrSingleStep races with the ThreadSuspend::SuspendEE, we get this deadlock. @filipnavara is now hitting it consistently when debugging his application that uses a lot of threads. Was the original code before your cleanup resilient to this issue?

janvorli commented 2 years ago

@Vsadov ah, the comment was there before, it just got moved, I am sorry for the confusion.

VSadov commented 2 years ago

I do not recall adding this comment. It looks like it may have been there before.

VSadov commented 2 years ago

frozen // a thread at a gc-unsafe place,

Note that this is not about setting a breakpoint. I think it refers to freeze/thaw functionality, so it should not be common

janvorli commented 2 years ago

The strange thing is that nothing seems to have changed since .NET Core 1.0 in the DebuggerController::DispatchPatchOrSingleStep and related stuff, the SENDIPCEVENT_END that triggers the Thread::RareDisablePreemptiveGC is the same, so I have hard time understanding why we haven't heard of this issue before.

@filipnavara did it start happening for you recently and was it working ok say with an older .NET 6 version?

Or maybe the debugger started to hook the System.Threading.Thread.StartCallback() recently? And maybe only on MacOS? I am just wildly guessing...

filipnavara commented 2 years ago

did it start happening for you recently and was it working ok say with an older .NET 6 version?

I first noticed it during work to upgrade to Xamarin.Mac Preview 14 workload. I don't think the workload is at fault though, the runtime version didn't change. I suspect that disabling "just my code" could have been a trigger. It is something that I had to change to debug unrelated issue and I just left it on.

The particular data that caused this heavy GC and high thread count is something I got from our test team about week ago. It could be simply that I never had data that stresses the GC/threading/debugger so heavily.

VSadov commented 2 years ago

In my understanding the logic is that debugger has priority here. We revert the suspension, since we can't walk stacks anyways and let debugger do something. The debugger either proceeds with full debugger suspension or releases the threads.

Only in rare cases debugger does not make progress and everything else ends up waiting for that. Maybe something changed in that logic.

VSadov commented 2 years ago

When we say "freeze" here - does the app locks up completely or there are pauses?

janvorli commented 2 years ago

@VSadov Complete lock up. We are actually not letting the debugger to do anything, as the thread that wants to send the breakpoint event to the debugger is waiting on the suspension event and it has incremented the count of threads at unsafe places. So the loop in the ThreadSuspend::SuspendEE can never succeed due to that count and the thread that was attempting to sent the event to the debugger can never continue due to its waiting in Thread::WaitSuspendEvents called by the Thread::RareEnablePreemptiveGC().

janvorli commented 2 years ago

But looking at the Thread::WaitSuspendEventsHelper made me remember that there is one thing that I've mentioned somewhere up in this issue. All threads have TS_DebugSuspendPending state, but we are attempting to do a GC suspension and there is no debug runtime suspension. I wonder if that's the cause of the issue. But I don't know when is that state supposed to get cleaned.

janvorli commented 2 years ago

See the clrthreads output, the lowest nibble of the state is 8, which corresponds to the TS_DebugSuspendPending:

(lldb) clrthreads
ThreadCount:      117
UnstartedThread:  0
BackgroundThread: 68
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
                                                                                                            Lock
 DBG   ID     OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
   1    1   11224f 000000012502DC00    20028 Preemptive  00000001432E5790:00000001432E57A0 0000000125022400 -00001 Ukn
   8    2   112269 000000012502E400    a1228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Finalizer)
   9    3   11226a 0000000125032A00    21228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn System.Security.SecurityException 0000000                                      1424c9dc8
  10    4   1122f3 000000012504FE00  10a0228 Preemptive  0000000143147F80:0000000143149C48 0000000125022400 -00001 Ukn (Threadpool Worker)
  11    5   1122f6 000000010780A200  2021228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  14    6   11231f 0000000126876000    b1228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  15    7   112326 0000000126890E00  10a1228 Preemptive  0000000143243098:00000001432444A8 0000000125022400 -00001 Ukn (Threadpool Worker)
  16    8   112327 0000000127817400  10a1228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Threadpool Worker)
  17    9   112329 0000000127828E00  10a1228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Threadpool Worker)
  18   10   11232b 0000000127012800    a1028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  19   11   112348 0000000127075400  10a1228 Preemptive  000000014314A050:000000014314BC48 0000000125022400 -00001 Ukn (Threadpool Worker)
  20   12   112364 0000000124D65000  1021228 Preemptive  000000014314CE40:000000014314DC48 0000000125022400 -00001 Ukn (Threadpool Worker)
  21   13   112365 0000000126038800  1121228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Threadpool Worker)
  22   14   112366 000000012659CC00  10a1228 Preemptive  0000000143331C68:0000000143333160 0000000125022400 -00001 Ukn (Threadpool Worker)
  23   15   112367 0000000126521800  1121228 Preemptive  00000001432BA408:00000001432BBB90 0000000125022400 -00001 Ukn (Threadpool Worker)
  24   16   112368 00000001265C7C00  1021228 Preemptive  00000001432E37A0:00000001432E37A0 0000000125022400 -00001 Ukn (Threadpool Worker)
  25   17   112370 000000012662A400  2021228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  26   18   11237b 0000000126F42400    21228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  27   19   112390 000000012577D600    a1228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
XXXX   20   112315 00000001049CA400 80030228 Preemptive  000000014329A878:000000014329C540 0000000125022400 -00001 Ukn
XXXX   21   112252 00000001049CC400 80030228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  28   22   112471 000000033C221E00    20228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
XXXX   23   11246c 0000000124D57200 80030228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  29   24   112476 000000033C1C0000  2021228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  30   25   112477 000000033C278200  2021228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  31   26   11247a 0000000127187E00  10a1228 Preemptive  0000000143154EE8:0000000143155C48 0000000125022400 -00001 Ukn (Threadpool Worker) System.Threading.Task                                      s.TaskCanceledException 0000000143154cc0
  32   27   11247b 000000033D9D9E00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  33   28   11247c 0000000127970400  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  34   29   11247d 00000001271A5C00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  35   30   11247e 0000000107980A00  2021028 Preemptive  0000000143157060:0000000143157C48 0000000125022400 -00001 Ukn
  38   31   112481 0000000107983C00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  36   32   11247f 0000000127970C00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  37   33   112480 00000001271A6400    a1028 Preemptive  0000000143308E58:0000000143309820 0000000125022400 -00001 Ukn
  39   34   112482 00000001271A6C00    a1028 Preemptive  00000001432BC058:00000001432BDB90 0000000125022400 -00001 Ukn
  40   35   112483 000000010798BC00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  41   36   112486 000000033D9D3A00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  42   37   112487 000000033C0BAA00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  43   38   112488 000000012796DE00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  46   39   11248b 000000033C0BBA00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  45   40   11248a 000000033C24CA00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  44   41   112489 000000033C24E400  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  47   42   11248c 000000033C24EC00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  48   43   11248d 000000033D9DE400  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  49   44   11248e 000000033D95C800  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  50   45   11248f 0000000127969E00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  51   46   112490 00000001049E5800  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  52   48   112491 00000001049E6800  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  53   47   112492 000000033D200000    a1028 Preemptive  0000000143158048:0000000143159C48 0000000125022400 -00001 Ukn
  54   49   112493 0000000127958A00    21028 Preemptive  00000001432B5AC0:00000001432B5B90 0000000125022400 -00001 Ukn
  55   50   112494 0000000107981200    21028 Preemptive  0000000143339158:0000000143339160 0000000125022400 -00001 Ukn
  56   51   112495 00000001049CF200    21028 Preemptive  0000000143341188:0000000143341190 0000000125022400 -00001 Ukn
  57   52   112497 0000000126E72400  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  58   53   11249a 0000000127180800  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  59   54   11249b 000000033D9DEC00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  60   55   11249c 000000012717E400  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  61   56   11249d 0000000107992000  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  62   57   11249e 000000033D971000  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  63   58   11249f 000000012717FE00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  64   59   1124a0 0000000126069C00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  65   60   1124a1 0000000107993000  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  66   61   1124a4 000000033C277A00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  67   62   1124a5 000000033C09CC00    21028 Preemptive  0000000143337110:0000000143337160 0000000125022400 -00001 Ukn
  68   63   1124af 0000000124F8EC00  1121228 Preemptive  0000000143334C90:0000000143335160 0000000125022400 -00001 Ukn (Threadpool Worker)
  69   64   1124b8 00000001271A9E00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  70   65   1124b9 000000033C0BB200  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  71   66   1124ba 000000033C0A2000  2021028 Preemptive  00000001431D9DB0:00000001431DBD58 0000000125022400 -00001 Ukn
  72   67   1124bb 0000000127962200  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  73   68   1124bc 000000033C0BA200  2021028 Preemptive  0000000143177D18:0000000143179C48 0000000125022400 -00001 Ukn
  74   69   1124bd 0000000125793A00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  77   70   1124c0 00000001049E9A00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  75   71   1124be 00000001271ADE00  2021028 Preemptive  0000000143201AD8:0000000143203A80 0000000125022400 -00001 Ukn
  76   72   1124bf 00000001049EE200  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  78   73   1124c1 000000012606E000  2021028 Preemptive  0000000143179D90:000000014317BC48 0000000125022400 -00001 Ukn
  79   74   1124c2 00000001049ECA00  2021028 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  80   75   1124c3 00000001265F9000  1021228 Preemptive  000000014333D178:000000014333D190 0000000125022400 -00001 Ukn (Threadpool Worker)
  81   76   1124c8 00000001271A9600    21228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  82   77   1124c9 00000001271AF000  2021228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  83   78   1124e9 000000033C250A00  1021228 Preemptive  00000001432A5A50:00000001432A6540 0000000125022400 -00001 Ukn (Threadpool Worker)
XXXX   79   1122ce 00000001271B0800 80030228 Preemptive  00000001431DD828:00000001431DDD58 0000000125022400 -00001 Ukn
XXXX   80   112318 000000033C2A9000 80030228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
XXXX   81   11246a 00000001271B7800 80030228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  85   82   112539 000000033E9DD600    21228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  86   83   11253a 000000033E9E2E00  2021228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
  88   84   11253f 00000001271C4000  10a1228 Preemptive  00000001432A93A0:00000001432AA540 0000000125022400 -00001 Ukn (Threadpool Worker) System.Threading.Task                                      s.TaskCanceledException 00000001432a92c8
  89   85   112540 00000001049F7A00  10a1228 Preemptive  000000014316BC48:000000014316BC48 0000000125022400 -00001 Ukn (Threadpool Worker)
  90   86   112541 00000001049F9000  10a1228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Threadpool Worker)
  91   87   112542 000000033DAB3A00  10a1228 Preemptive  000000014324A658:000000014324C4A8 0000000125022400 -00001 Ukn (Threadpool Worker)
  92   88   112543 000000033DA5C800  1021228 Preemptive  0000000143346D38:0000000143346D50 0000000125022400 -00001 Ukn (GC) (Threadpool Worker)
  93   89   112547 00000001271C5200  10a1228 Preemptive  00000001432B1E28:00000001432B3B90 0000000125022400 -00001 Ukn (Threadpool Worker)
  94   90   112549 000000033DA60A00  10a1228 Preemptive  0000000143189FC8:000000014318BD28 0000000125022400 -00001 Ukn (Threadpool Worker) System.Threading.Task                                      s.TaskCanceledException 0000000143189d40 (nested exceptions)
  96   91   11254b 000000033C1E3800  1021228 Preemptive  0000000143331148:0000000143331160 0000000125022400 -00001 Ukn (Threadpool Worker)
  97   92   11254c 000000033DAAEC00  1021228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Threadpool Worker)
  95   94   11254a 000000033DA9C600  1021228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Threadpool Worker)
  99   93   11254e 000000033DA9BE00  10a1228 Preemptive  0000000143252988:0000000143252C68 0000000125022400 -00001 Ukn (Threadpool Worker)
  98   95   11254d 00000001271C5A00  10a1228 Preemptive  000000014325FEB0:0000000143260400 0000000125022400 -00001 Ukn (Threadpool Worker)
 100   97   11254f 00000001271CEA00  10a1228 Preemptive  0000000143190320:0000000143191D28 0000000125022400 -00001 Ukn (Threadpool Worker)
 102   98   112551 000000033EACE200  10a1228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Threadpool Worker)
 101   96   112550 000000033EA0BC00  10a1228 Preemptive  0000000143196658:0000000143197D58 0000000125022400 -00001 Ukn (Threadpool Worker)
 105   99   112555 000000010799E000  10a1228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Threadpool Worker)
 103  100   112553 000000033D2A8E00  10a1228 Preemptive  000000014317E520:000000014317FC48 0000000125022400 -00001 Ukn (Threadpool Worker)
 104  101   112554 000000033DAC8200  10a1228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Threadpool Worker)
 106  102   112556 000000033E94F200  1021228 Preemptive  0000000143305020:0000000143305820 0000000125022400 -00001 Ukn (Threadpool Worker)
XXXX  103   112458 000000033C1E0400 80030228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
 108  104   112649 000000012799E400  10a1228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Threadpool Worker)
 109  105   11264a 00000001271DA800  10a1228 Preemptive  0000000143208078:0000000143209E10 0000000125022400 -00001 Ukn (Threadpool Worker)
 111  106   112651 000000012485B400  1021228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn (Threadpool Worker)
 110  107   11264f 00000001079C1E00  10a1228 Preemptive  00000001432FABD0:00000001432FB7A0 0000000125022400 -00001 Ukn (Threadpool Worker)
 112  108   112666 00000001079DAE00    21228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
 113  109   112667 00000001271EB800    21228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
 115  110   112669 000000033C282200  2021228 Preemptive  00000001431FFE78:00000001432009C8 0000000125022400 -00001 Ukn
 114  111   112668 00000001279B7200  2021228 Preemptive  000000014326A640:000000014326C400 0000000125022400 -00001 Ukn
 116  112   11266a 000000033EA91C00    21228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
 117  113   11266b 00000001079D5800  2021228 Preemptive  00000001431A51B8:00000001431A5D58 0000000125022400 -00001 Ukn
XXXX  114   11260d 0000000126078400 80030228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
XXXX  115   11260f 000000012486BE00 80030228 Preemptive  00000001431B14A8:00000001431B1D58 0000000125022400 -00001 Ukn
 118  116   112677 000000033E9DBE00    a1228 Preemptive  0000000143315B18:0000000143317850 0000000125022400 -00001 Ukn
 119  117   112683 000000033DB1AC00    a1228 Preemptive  0000000000000000:0000000000000000 0000000125022400 -00001 Ukn
janvorli commented 2 years ago

I've spent some time investigating it with great help from @hoyosjs. It turns out it is an issue that @kouvel is already looking into, related to tiered compilation.

filipnavara commented 2 years ago

I really appreciate all the effort and prompt responses. Thanks!

mangod9 commented 2 years ago

Closing since @kouvel's PR has merged.

filipnavara commented 2 years ago

Do we want to backport to .NET 6 (where I originally hit the issue and keep hitting it)?

noahfalk commented 2 years ago

@kouvel's fix was quite large and we felt it was too risky to apply it as a servicing patch for 6.0. Instead we made a much smaller change in 6.0 that we hope will avoid the majority of the problematic cases, but it isn't a total fix: https://github.com/dotnet/runtime/pull/69121

The original issue that discussed debugger deadlock is here if it helps fill out the story.

@davmason - do you know which servicing release your mitigation fix went out in (or will go out in)?

filipnavara commented 2 years ago

Thanks, should be 6.0.6. I spent most of the time with 6.0.6/6.0.7 on Windows so I didn't get to try the scenario again and I missed the targeted fix.