Closed jeff-simeon closed 2 years ago
That's... an interesting observation. If you have the locals window open maybe that could cause a funceval. And no, somehow I still don't see them :( The last comment I see is from 9/29.
Gotcha - @hoyosjs, since it seems like the developercommunity site doesn't function, is there another way you'd suggest I transmit them? I can put them on OneDrive and email a link if you'd like
DevCommunity is preferred because of GDPR compliance and such. If you feel that there's no sensitive data and you feel like you are getting pain from communication, you can get my email from my profile. I'll update this thread with results for the visibility of the community, unless there's private information that comes from it.
@hoyosjs - It does look like my comment and upload posted
Ah, I see. You uploaded them to the old issue. You opened a first thread, then a second thread. I closed the older thread and redirected it to the newer one as it had more files.
For future reference, the one that is still open that I was looking at was this.
I'm downloaded them and see they have logs. I'll check them in the next couple of days and get back to you.
thanks @hoyosjs
@jeff-simeon We continued looking at this and have a new theory of what's causing the deadlock. I also saw that the suggested workaround was properly applied didn't help in the way I thought it would, I'm sorry about that.
All good @hoyosjs
if you have any other workarounds you can suggest we would greatly appreciate it
@jeff-simeon
Also, I might be crazy, but I think I just noticed a pattern. It seems like the problem occurs when I context switch and bring a new window to the foreground in front of Visual Studio while the program or tests are running.
I must have run our acceptance tests in the debugger 10+ times today with no issue and then I brought up a browser while running and the issue suddenly reproduced. I ran again, brought up a browser, and the issue reproduced again. Then I left VS focused while running and the issue did not occur. I repeated this pattern 4 or 5 more times with the same result.
You're not crazy, I've noticed the same pattern. Sent some new dumps and I think there's some progress being made on this issue, let's hope for a resolution soon.
hi @hoyosjs - any update on a workaround/resolution?
(Sorry - this seems to have stuck in my outbox limbo. That's what I get for trying to reply to GitHub on my email.)
Hey @jeff-simeon. I think we might have an idea of what is causing this issue. While I might take a while as I make sure I am on the right trail, there's something that might help as a workaround and it will definitely be easier for you to confirm if it helps that anything I can do on my side.
I was talking to @davmason and he realized that my suggestion to disable tiering is not complete. There's a feature in the profiler that uses that same feature that I believe is a player in the current issue you see. So in addition of needing DOTNET_TieredCompilation=0
/COMPLUS_TieredCompilation=0
, you should set COMPlus_ProfApi_RejitOnAttach=0
and see if it helps out on this.
@hoyosjs I had already tried this (setting both COMPlus_TieredCompilation
and COMPlus_ProfApi_RejitOnAttach
to 0
in system environment variables) after seeing this comment. But it did not solve the issue. :(
Ok! I was fumbling around and I think I fixed the issue. But I cannot confirm what exactly fixed it. Maybe someone else can try what I did and confirm if it worked. Note that I am using Visual Studio 2022 17.0.1
After the fix:
Stop
debugging without VS freezing upWhat I did:
dotnet --info
after the cleanup. Sorry, I don't have a "before"..NET SDK (reflecting any global.json):
Version: 6.0.100
Commit: 9e8b04bbff
Runtime Environment:
OS Name: Windows
OS Version: 10.0.22000
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\6.0.100\
Host (useful for support):
Version: 6.0.0
Commit: 4822e3c3aa
.NET SDKs installed:
3.1.415 [C:\Program Files\dotnet\sdk]
5.0.403 [C:\Program Files\dotnet\sdk]
6.0.100 [C:\Program Files\dotnet\sdk]
.NET runtimes installed:
Microsoft.AspNetCore.App 3.1.21 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 5.0.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 6.0.0 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.NETCore.App 3.1.21 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 5.0.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 6.0.0 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.WindowsDesktop.App 3.1.21 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 5.0.12 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 6.0.0 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
To install additional .NET runtimes or SDKs:
https://aka.ms/dotnet-download
Diagnostic Tools
tab, click Select Tools
-> Settings
Enable resource usage limits
. Apply
-> Ok
.
Tagging the other open issue here for reference
@hoyosjs - confirmed this is not resolving the issue for us....what is the status here please?
Hi @jeff-simeon I am a coworker for @hoyosjs. He has been out on vacation for Christmas holidays but now that I am back from my own holiday vacation I'm going to fill in for him and help get this moving. I assisted with some of the earlier investigation so I think I am already mostly up-to-speed on this. My understanding so far is that:
Dump 1 (thread 17):
02 0000006f`0467f570 00007ffc`dffba656 coreclr!ThreadSuspend::SuspendEE+0x228 [D:\workspace\_work\1\s\src\coreclr\src\vm\threadsuspend.cpp @ 6097]
03 0000006f`0467f710 00007ffc`dfe5f3d9 coreclr!CallCountingManager::StopAndDeleteAllCallCountingStubs+0xa9182 [D:\workspace\_work\1\s\src\coreclr\src\vm\callcounting.cpp @ 960]
Dump 2 (thread 30):
07 00000006`6b49f600 00007ffc`cd57a656 coreclr!ThreadSuspend::SuspendEE+0x449 [D:\workspace\_work\1\s\src\coreclr\src\vm\threadsuspend.cpp @ 6236]
08 00000006`6b49f7a0 00007ffc`cd41f3d9 coreclr!CallCountingManager::StopAndDeleteAllCallCountingStubs+0xa9182 [D:\workspace\_work\1\s\src\coreclr\src\vm\callcounting.cpp @ 960]
Dump 3(thread 17):
03 000000ed`b157f670 00007ffe`57baa656 coreclr!ThreadSuspend::SuspendEE+0x283 [D:\workspace\_work\1\s\src\coreclr\src\vm\threadsuspend.cpp @ 6144]
04 000000ed`b157f810 00007ffe`57a4f3d9 coreclr!CallCountingManager::StopAndDeleteAllCallCountingStubs+0xa9182 [D:\workspace\_work\1\s\src\coreclr\src\vm\callcounting.cpp @ 960]
Dump 4, thread 27
06 00000004`f1778030 00007ffc`85992b0a coreclr!CrstBase::Enter+0x5a [D:\workspace\_work\1\s\src\coreclr\src\vm\crst.cpp @ 330]
07 (Inline Function) --------`-------- coreclr!CrstBase::AcquireLock+0x5 [D:\workspace\_work\1\s\src\coreclr\src\vm\crst.h @ 187]
08 (Inline Function) --------`-------- coreclr!CrstBase::CrstAndForbidSuspendForDebuggerHolder::{ctor}+0x5db [D:\workspace\_work\1\s\src\coreclr\src\vm\crst.cpp @ 819]
09 (Inline Function) --------`-------- coreclr!MethodDescBackpatchInfoTracker::ConditionalLockHolderForGCCoop::{ctor}+0x5db [D:\workspace\_work\1\s\src\coreclr\src\vm\methoddescbackpatchinfo.h @ 134]
0a 00000004`f1778060 00007ffc`85991f6c coreclr!CodeVersionManager::PublishVersionableCodeIfNecessary+0x8ba [D:\workspace\_work\1\s\src\coreclr\src\vm\codeversion.cpp @ 1762]
ForbidSuspendForDebugger
region to completely eliminate it. We think disabling tiered compilation and rejit together should have been sufficient to accomplish that, but we have reports from you and @amandal1810 that it still didn't work and we don't yet know why. It is possible our analysis missed something, or that there is yet another variation of the problem we are still unaware of, or there was some mistake in how we had you set up the most recent experiment. I'm sorry to keep asking for dumps but if you can capture one where the app is deadlocked and both tiered compilation and RejitOnAttach are disabled that will help us resolve this part of the puzzle.In the meantime I am working on a fix for the portions of the bug we do understand from the dumps you already provided. However given that disabling both tiered compilation and rejit didn't work suggests our understanding of the issue is incomplete and anything I do to fix the part we do understand isn't going to be sufficient to fully solve this for you.
Next steps:
Sorry for the delayed reply @noahfalk. Ultimately, we decided to move to Rider on MacOS for dev along with AVD VMs where Windows is strictly required. While expensive, the cost of the hardware is nominal in comparison to the productivity lost or the effort in downgrading to an earlier version of dotnet.
I still would like to help get you the information you need, but it will take some time to get a new dev environment set up where I can reproduce.
No worries on the timing at all @jeff-simeon and sorry that it came to a new hardware purchase just to avoid this issue : ( I certainly appreciate any time you choose to spend helping diagnose the issue whenever that is.
Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.
Author: | jeff-simeon |
---|---|
Assignees: | - |
Labels: | `area-Diagnostics-coreclr` |
Milestone: | 7.0.0 |
Thanks to @kouvel https://github.com/dotnet/runtime/pull/67160 should have fixed the issue in 7.0. @noahfalk is working on a 6.0-servicing version of the fix.
The fix @kouvel made thus far addresses the issues that were caused by TieredCompilation and RejitOnAttach. Some of the folks on this thread said that was sufficient to resolve the issue for them but others said they could still reproduce deadlocks after those two features were disabled. We did identify a likely 3rd culprit which is theorized to produce a similar looking deadlock but it hasn't yet been fixed.
I've looked over a couple of options on that theorized issue after TieredCompilation and RejitOnAttach are disabled, though it's not clear yet what is actually causing that deadlock. There is a promising option for the theorized issue, more to look at.
Closing via https://github.com/dotnet/runtime/pull/69121
Description
This is a duplicate of https://github.com/dotnet/runtime/issues/42375 as far as symptoms and behavior go but I am still encountering the exact same symptoms on 5.0.400. I can reproduce this on Mac OS and Windows.
When debugging, our dev team encounters sporadic hangs (about 30% of the time). There does not seem to be any specific reproducible pattern of when in the program execution the hang occurs. When it happens, the diagnostics logger stops updating
and I cannot break or terminate the program:
If I try to
dotnet trace collect
on a hung process,dotnet trace
hangs as well.I have tried taking and analyzing a memory dump using wpr as described here, but I have not been able to find anything meaningful.
Configuration
Reproduced on 5.0.400 on Mac OS and Windows. In Visual Studio and Rider IDE.
Regression?
This issue seems to have started when we upgraded from netcoreapp3.1 to net5.0
Other information
The amount of logging and amount of asynchronous operations seems to make the issue more/less prevalent. For example, turning down the log level makes the issue happen about 20% of the time instead of 30% of the time.