dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.91k stars 4.63k forks source link

App is experiencing a freeze, probably due to GC suspending threads #105463

Open gs-niteesh opened 1 month ago

gs-niteesh commented 1 month ago

Description

Our application is experiencing a freeze; upon analyzing the minidump, it appears as if the threads are deadlocked during thread suspension for GC.

Our application is an agent that sits in a remote computer and connects to our server, transferring metrics and logs. The agent and the server communicate through HTTP and RPC.

We are uncertain about the conclusions drawn from our analysis. Therefore, we have raised this issue to seek confirmation of our findings.

Version

0:011> !eeversion
4.7.4095.0 free
Workstation mode
SOS Version: 4.7.4095.0 retail build

Analysis

We took a mini dump of the application process, and below is the summary of our analysis.

The GC thread (IDX: 12 TID: 0x1470) is in the mark phase and is trying to suspend the runtime while holding the ThreadStoreLock (0x000001b685b1dc10), meanwhile there is an RPC call from the server, so a new unmanaged thread is created by the RPC system, and the RPC callback is invoked. In the RPC callback we invoke a C# function. We are not sure which function is invoked on our side(The stack trace of the RPC Thread (IDX: 13 TID: 0x408) mentions function REM_SendMsgToAdapServer but we are sure it is not the function that is not being called) but based on the stack trace it seems like a new managed thread is created in the invoked C# function, and while registering the newly created thread to the ThreadStore we try to acquire the ThreadStoreLock(0x000001b685b1dc10) but it has been acquired already by the background GC thread, thus causing a deadlock.

I have attached the DebugDias diagnosis file. It contains stack traces of all the threads, information on critical sections, etc. But I am happy to post additional information if required. [ADAPAgent_MultipleRules.zip]()

Below is the output of the ThreadState command for all the threads, which is missing in the diagnosis file.

0:011> !ThreadState 1470
    GC On Transitions
    Legal to Join
    Yield Requested
    Unstarted
    CLR Owns
0:011> !ThreadState 11cc
    User Suspend Pending
    Debug Suspend Pending
    Yield Requested
    Hijacked by the GC
    Blocking GC for Stack Overflow
    CLR Owns
0:011> !ThreadState 1278
    Debug Suspend Pending
    GC On Transitions
    Legal to Join
    Yield Requested
    Background
    CLR Owns
0:011> !ThreadState 11ac
    User Suspend Pending
    Debug Suspend Pending
    Legal to Join
    Hijacked by the GC
    Blocking GC for Stack Overflow
    CLR Owns
0:011> !ThreadState a44
    User Suspend Pending
    Yield Requested
    Background
    Dead
0:011> !ThreadState 19c
    User Suspend Pending
    Debug Suspend Pending
    GC On Transitions
    Hijacked by the GC
    Blocking GC for Stack Overflow
0:011> !ThreadState 3d0
    GC On Transitions
    Yield Requested
    Hijacked by the GC
    Blocking GC for Stack Overflow
    Background
0:011> !ThreadState d4c
    User Suspend Pending
    Debug Suspend Pending
    Yield Requested
    Blocking GC for Stack Overflow
    Unstarted
    Dead
0:011> !ThreadState c5c
    User Suspend Pending
    Debug Suspend Pending
    GC On Transitions
    Yield Requested
    Unstarted
    Dead
0:011> !ThreadState f80
    Hijacked by the GC
    Blocking GC for Stack Overflow
    Background
    Unstarted
    Dead
0:011> !ThreadState dac
    User Suspend Pending
    Debug Suspend Pending
    Legal to Join
    Hijacked by the GC
    Blocking GC for Stack Overflow
    Unstarted
    Dead
0:011> !ThreadState da8
    Debug Suspend Pending
    Legal to Join
    Hijacked by the GC
    Blocking GC for Stack Overflow
    Unstarted
    Dead
0:011> !ThreadState a24
    User Suspend Pending
    Legal to Join
    Background
    Dead

Please confirm if our analysis is correct, f it is correct, kindly suggest a solution or workaround to address the issue.

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.

tommcdon commented 1 month ago

@mangod9 I just put this in the VM area path. I noticed from the thread state information in the repro steps that a stack overflow is in progress. Perhaps that is related to the deadlock?

mangod9 commented 1 month ago

Hey @gs-niteesh, are you able to share a dump of when the deadlock happens? Also is this on .NET 4.7?

gs-niteesh commented 1 month ago

Hey @mangod9

I apologize for the delay in my response.

Unfortunately, we are unable to share the data dump as it contains sensitive information. However, if you could provide a list of the specific analyses you require, I would be happy to perform them and share the results with you.

And regarding the version, it's .NET framework runtime 4.7.4095. But the application itself was compiled using .NET v4.5

I have below attached details of the loaded CLR module in the dump.

    Loaded symbol image file: clr.dll
    Image path: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll
    Image name: clr.dll
    Browse all global symbols  functions  data
    Timestamp:        Fri Apr  5 03:20:56 2024 (660F20C0)
    CheckSum:         00A34DCF
    ImageSize:        00A3A000
    File version:     4.7.4095.0
    Product version:  4.0.30319.0
    File flags:       8 (Mask 3F) Private
    File OS:          4 Unknown Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    Information from resource tables:
        CompanyName:      Microsoft Corporation
        ProductName:      Microsoft® .NET Framework
        InternalName:     clr.dll
        OriginalFilename: clr.dll
        ProductVersion:   4.7.4095.0
        FileVersion:      4.7.4095.0 built by: NET472REL1LAST_B
        PrivateBuild:     DDBLD299D
        FileDescription:  Microsoft .NET Runtime Common Language Runtime - WorkStation
        LegalCopyright:   © Microsoft Corporation.  All rights reserved.
        Comments:         Flavor=Retail
mangod9 commented 1 month ago

Hey @gs-niteesh,

For .NET Framework related issues you will have to route issues via developer community feedback: https://developercommunity.visualstudio.com/dotnet. Thx

gs-niteesh commented 1 month ago

Hey @mangod9,

I have raised the issue there; you can close this issue if you want now. Thank you.