Unwind info not published in growable function tables

dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

https://docs.microsoft.com/dotnet/core/

MIT License

15.13k stars 4.7k forks source link

Unwind info not published in growable function tables #57962

Open dmex opened 3 years ago

dmex commented 3 years ago

Hello,

Not sure where to post this one. There seems to be some sort of codegen(?) issue when adding the win-x64 runtime identifier to the csproj file that breaks the StackWalk64 and StackWalkEx functions on Windows 10.

For example this csproj:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net5.0</TargetFramework>
    <PlatformTarget>AnyCPU</PlatformTarget>
    <DebugType>full</DebugType>
    <DebugSymbols>true</DebugSymbols>
    <RuntimeIdentifier>win-x64</RuntimeIdentifier>
  </PropertyGroup>

</Project>

This source code:

using System;

namespace ConsoleApp6
{
    class Program
    {
        static void Main(string[] args)
        {  
            Console.WriteLine("Hello World!");
            Console.ReadLine();
        }
    }
}

Will break both of those functions until RuntimeIdentifier is removed. It doesn't seem to break windbg or SOS since neither of them use StackWalk64/StackWalkEx but rather breaks everyone else that does use those functions. For example:

Microsoft DbgShell shows an invalid stack trace for the main thread:

Process Hacker also shows an invalid stack:

It also breaks other software but these are the only two needed for the bug report. After removing the RuntimeIdentifier from the csproj (delete: <RuntimeIdentifier>win-x64</RuntimeIdentifier>) and recompiling the binary, the stack enumeration using StackWalk64/StackWalkEx works perfectly fine.

Microsoft DbgShell now shows the correct stack (without DAC):

Process Hacker also now shows the correct stack (with DAC):

You can see from the PH screenshots that frame 2 is ILStubClass.IL_STUB_PInvoke but for the binaries compiled with RuntimeIdentifier the return address for that frame was 0x1 which seems to break the stack?

I rewrote the Process Hacker stack enumeration to use StackWalkEx thinking it was an issue on our end but the problem remained... I've debugged the PH code and searched the donet repo for any relevant code but couldn't find anything obvious for the behavior so I'm inclined to believe this is some sort of codegen issue?

Reproduction steps:

1) Download DbgShell: https://github.com/microsoft/DbgShell 2) Compile and execute binary with win-x64 RuntimeIdentifier using above code. 3) DbgShell execute: Connect-Process xxxx replacing x with the process id. 4) DbgShell switch to thread 0: cd ..\0 5) DbgShell execute: k for thread stack. 6) Invalid stack. 7) Remove the win-x64 RuntimeIdentifier and recompile. 8) Valid stack.

Configuration

Which version of .NET is the code running on? net5.0
What OS and version, and what distro if applicable? Windows 10, 21H1, 19043.1165, x64,
What is the architecture (x64, x86, ARM, ARM64)? x64
Do you know whether it is specific to that configuration? No

ghost commented 3 years ago

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.

Issue Details

Hello, Not sure where to post this one. There seems to be some sort of codegen(?) issue when adding the `win-x64` runtime identifier to the csproj file that breaks the [StackWalk64](https://docs.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-stackwalk64) and [StackWalkEx](https://docs.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-stackwalkex) functions on Windows 10. For example this csproj: ``` Exe net5.0 AnyCPU full true win-x64 ``` This source code: ``` using System; namespace ConsoleApp6 { class Program { static void Main(string[] args) { Console.WriteLine("Hello World!"); Console.ReadLine(); } } } ``` Will break both of those functions until RuntimeIdentifier is removed. It doesn't seem to break windbg or SOS since neither of them use StackWalk64/StackWalkEx but rather breaks everyone else that does use those functions. For example: [Microsoft DbgShell](https://github.com/microsoft/DbgShell) shows an invalid stack trace for the main thread: ![image](https://user-images.githubusercontent.com/1306177/130494154-afaed87f-d5e2-417a-82a6-e88239aba270.png) [Process Hacker](https://processhacker.sourceforge.io/) also shows an invalid stack: ![image](https://user-images.githubusercontent.com/1306177/130494301-dc64e1c4-1262-405d-b615-e7dcdc65403d.png) It also breaks other software but these are the only two needed for the bug report. After removing the RuntimeIdentifier from the csproj (delete: `win-x64`) and recompiling the binary, the stack enumeration using StackWalk64/StackWalkEx works perfectly fine. Microsoft DbgShell now shows the correct stack (without DAC): ![image](https://user-images.githubusercontent.com/1306177/130497261-4f4329f1-8b31-4098-a628-c19d27f239be.png) Process Hacker also now shows the correct stack (with DAC): ![image](https://user-images.githubusercontent.com/1306177/130496899-f05dc730-2a40-423a-b319-43c2926169c5.png) You can see from the PH screenshots that frame 2 is `ILStubClass.IL_STUB_PInvoke` but for the binaries compiled with RuntimeIdentifier the return address for that frame was 0x1 which seems to break the stack? I rewrote the Process Hacker stack enumeration to use StackWalkEx thinking it was an issue on our end but the problem remained... I've debugged the PH code and searched the donet repo for any relevant code but couldn't find anything obvious for the behavior so I'm inclined to believe this is some sort of codegen issue? Reproduction steps: 1) Download DbgShell: https://github.com/microsoft/DbgShell 2) Compile and execute binary with `win-x64` RuntimeIdentifier using above code. 3) DbgShell execute: `Connect-Process xxxx` replacing x with the process id. 4) DbgShell switch to thread 0: `cd ..\0` 5) DbgShell execute: `k` for thread stack. 6) Invalid stack. 7) Remove the `win-x64` RuntimeIdentifier and recompile. 8) Valid stack. ### Configuration * Which version of .NET is the code running on? net5.0 * What OS and version, and what distro if applicable? Windows 10, 21H1, 19043.1165, x64, * What is the architecture (x64, x86, ARM, ARM64)? x64 * Do you know whether it is specific to that configuration? No

Author:	dmex
Assignees:	-
Labels:	`area-Diagnostics-coreclr`, `untriaged`
Milestone:	-

josalem commented 3 years ago

CC @agocke for insight on what might be different when specifying the runtime identifier. If I recall, that would be packaging the app in a single-file bundle under .net5, correct?

CC @BruceForstall in case there's some difference in how codegen operates under this configuration, though I am not aware of one.

@dmex what version of the SDK did you use to build the target app? (run dotnet --info to get all the details) Do you see this same behavior under 3.1 or 6.0 with the same configuration?

BruceForstall commented 3 years ago

CC @BruceForstall in case there's some difference in how codegen operates under this configuration

I guess it depends on exactly how the configurations are different, from the JIT perspective. Is this x64 code in all cases, or is one x86 and one x64? Are there different optimization switches? Stack walking presumably works fine when done from the CLR itself, so the question is what does StackWalk require that isn't being generated? Maybe unwind info isn't reported to the OS? I think the CLR does that lazily.

dmex commented 3 years ago

Do you see this same behavior under 3.1 or 6.0 with the same configuration?

I was able to reproduce the same behavior with .NET Core 2.0, 3.0, 3.1, 5.0 and 6.0. @josalem

Versions prior to 5.0 looked completely valid just with unresolved frames but a more careful examination showed all addresses were actually invalid, so I didn't notice all versions had the same issue and thought it was a regression.

Maybe unwind info isn't reported to the OS?

Thanks for that tip.

I took a closer look at the unwind table and traced the problem to the SymGetModuleBase64 function and the SymFunctionTableAccess64 function. These two functions internally lookup the unwind information from the process and return 0 when hitting a .NET core CLR frame - dbghelp.dll!DbhStackServices::GetFunctionEntry doesn't seem to validate the return result of 0 indicating an error and continues walking the stack with an invalid state.

That's one problem but I found the actual issue... An almost identical bug was reported and fixed with #7675 that affected the MiniDumpWriteDump function and the MiniDumpAuxiliaryDlls registry key.

The KnownFunctionTableDlls key is missing the required version of mscordaccore.dll (distributed with the application) and without that it prevents SymGetModuleBase64/SymFunctionTableAccess64 from unwinding stack information.

I manually added a registry key to KnownFunctionTableDlls with the path to that version of mscordaccore.dll with the application (e.g. D:\Projects\ConsoleApp6\bin\Debug\net5.0\win-x64\mscordaccore.dll):

This fixed StackWalk64/StackWalkEx and also fixed DbgShell which uses the dbgeng.dll COM debugger interface:

Tracing some of the windbg calls shows they're not calling either of those dbghelp functions and providing their own versions to the stack walk functions and they walk the modules and manually load mscordaccore.dll from the application' directory which is why they don't have the same problem as dbgeng.dll/DbgShell and dbghelp.dll/Process Hacker when walking thread stacks.

Should #7675 be re-opened since this issue is the exact same problem reported in that issue? It seems someone forgot to implement the same fix for MiniDumpWriteDump in the SymGetModuleBase64/SymFunctionTableAccess64 functions?

We could write our own implementation of SymFunctionTableAccess64 that does the same as windbg and look for the binary containing the MINIDUMP_AUXILIARY_PROVIDER resource and use that for loading the function tables (we're already testing this and it works) but I don't think dbgeng clients like DbgShell (not to mention everyone) can provide those callbacks (or do the required signature checks for security), so they'll still be stuck with broken stack enumeration until these functions are updated?

BruceForstall commented 3 years ago

Maybe @josalem @noahfalk @tommcdon know whether Core needs a new design to support the SymGetModuleBase64/SymFunctionTableAccess64 functions similar to the fix in #7675 -- presumably Core can't use the registry key KnownFunctionTableDlls.

noahfalk commented 2 years ago

Sorry it looks like that original ping back in September was while I was out on sabbatical and I missed it. Thanks @dmex for the great investigation! This will probably take some further investigation on how best to resolve the issue, but my initial thoughts on solutions in order of preference:

The runtime would eliminate usage of RtlInstallFunctionTableCallback and solely rely on RtlAddGrowableFunctionTable. The OS knows where these function tables are in memory and knows the prescribed memory layout so lookups in these tables could work without any app-specific extensibility mechanism (like mscordaccore). For example ETW walks user mode callstacks within the kernel, it is prohibited from loading any app code to assist because of the security boundary, and that scenario works correctly today. However I suspect the debugger scenario is missing some key pieces that would need to be implemented:
- OS APIs that would allow a 2nd (debugger) process to access the growable function tables registered with the OS by the debuggee process.
- Make SymFunctionTableAccess64 invoke the new OS API to discover the dynamic function tables by default
- Integration with minidump creation code so that dynamic function tables are preserved in the captured memory, and the dump records that those regions of memory are function tables (in a similar way that minidump module lists record that regions of memory are loaded modules)

This path ideally completes debugger/OS support for dynamic function tables in a way that lets any dynamic code generator participate without needing to jump through hoops of distributing custom dlls like mscordaccore. It also avoids substantial annoyances for debugging tools trying to properly acquire these dlls and validate that it would be safe to run code provided by a 3rd party.

Add a new resource, similar to the existing DUMP_HELPER resource whose purpose is to provide an alternative lookup mechanism for KnownFunctionTableDlls. The advantage of using a new resource is that nothing guarantees that the dll implementing dump generation logic needs to be the same dll as the one implementing function table inspection logic. For CLR we happened to use the same dll for both because it was convenient, but its a fairly arbitrary choice. Some runtimes might also want to support just one of these extensibility points without any implication that they must support the other. Once the resource was present, all the places that currently do KnownFunctionTableDlls lookup would need to be updated to support the new resource.
Same as (2) except reuse the existing DUMP_HELPER resource rather than create a new one.

We aren't going to have time to get to this in .NET 7 so I've put it on the backlog for .NET 8 for now. For anyone needing to resolve this more quickly @dmex had some good suggestions on workarounds above.

mikelle-rogers commented 1 year ago

Moving to future as there are workarounds and this doesn't fit into our priorities for .NET 8. We will reconsider in future releases as we receive feedback.

hoyosjs commented 1 year ago

A few things from this investigation:

We have never really published data in the default path as we once thought we did. This goes back to at least .NET Framework 4.x. However, framework dependent apps with a runtime installed via the MSI and .NET Framework based apps won't notice any issues from this. Self-contained publishes (which the RuntimeIdentifiers property affects) and apps running with a shared runtime installed using the dotnet-install script will hit this.
We only publish into the growable table if there's been an event active that gets emitted through the ETW callback. Once that happens, methods that get published after will get published into the growable tables. See https://github.com/dotnet/runtime/blob/83bf4b66ed3de5905b2e410a194a0f5f3d924c3d/src/coreclr/vm/eventtrace.cpp#L2678-L2682
We should expand the support to ARM64 - right now it's limited to windows x64.

Adding a call to UnwindInfoTable::PublishUnwindInfo in CEEJitInfo::WriteCode before the call to PublishUnwindInfoForMethod is enough to initialize the infrastructure once as in the patch below. However, since this means that the CodeManager now publishes to this table per compilation, and there's kernel locks in this path, we probably want to perf test if it regresses startup/heavy loading perf.


diff --git a/src/coreclr/vm/jitinterface.cpp b/src/coreclr/vm/jitinterface.cpp
index e813d6f2d27..b8535c9fd0b 100644
--- a/src/coreclr/vm/jitinterface.cpp
+++ b/src/coreclr/vm/jitinterface.cpp
@@ -10783,6 +10783,10 @@ void CEEJitInfo::WriteCode(EEJitManager * jitMgr)
#if defined(TARGET_AMD64)
 // Publish the new unwind information in a way that the ETW stack crawler can find
 _ASSERTE(m_usedUnwindInfos == m_totalUnwindInfos);

// This initializes the infrastructure and emits all backlogged entries.
// It will only initialize things once.
UnwindInfoTable::PublishUnwindInfo(g_fEEStarted != FALSE);
UnwindInfoTable::PublishUnwindInfoForMethod(m_moduleBase, m_CodeHeader->GetUnwindInfo(0), m_totalUnwindInfos);

endif // defined(TARGET_AMD64
There are a couple calls into InstallEEFunctionTable which registers callback based unwind info. One in creation of code heaps, which perhaps could be replaced with the growable table initializations in lieu of doing it in the CEEJitInfo path. The other place is when emitting unwind sequences in the stublinker. We'll need to keep a path the DAC/debugger can query.
RtlAddFunctionTable is not recommended according to docs. It's currently used to publish exception unwind info in ConvertedImageLayout file-backed R2R images on windows, which doesn't seem to be used - we don't use a flat layout of r2r images on Windows. However - we probably want to turn this one to a growable table as well.

@jkotas do you know if there's any prior concerns with growable function tables?

noahfalk commented 1 year ago

Thanks for all the info @hoyosjs! A little confusion on one part: "We only publish into the growable table if there's been an event active that gets emitted through the ETW callback". Looking at the code I don't think this requires sending an event, it just requires that the EventSource has been turned on with relevant keywords. Is there some dependency on an event getting sent that I missed?

jkotas commented 1 year ago

do you know if there's any prior concerns with growable function tables?

Publishing of the growable function tables can add very significant overhead, in particular when DynamicMethods are used heavily by the app. It is one of the reasons why we are not doing that by default. We have done multiple rounds of fixes in this area, so it is not as bad as it used to be. The root cause of this performance problem is baked into the growable function table API design, so it is hard to fix.
Publishing of the growable function tables has race conditions. There are small windows where the unwind is not published correctly. These race conditions are ok for diagnostic (there are number of other corner cases where the diagnostic stackwalk does not work), but they would not be acceptable for functionality (it would not be ok for you app to crash intermittently). If we were to switch to growable function tables as the only source of unwind info, we would need to do something about this.

We should expand the support to ARM64 - right now it's limited to windows x64.

Why would we want to do that? Arm64 uses different stackwalking strategy. The unwindinfo is not necessary to get good diagnostic stacktrace on Arm64.

RtlAddFunctionTable is not recommended according to docs.

Can you share the link to the docs that says this? I think it is perfectly fine to use RtlAddFunctionTable when the usage pattern matches the API design.

hoyosjs commented 1 year ago

Thanks for all the info @hoyosjs! A little confusion on one part: "We only publish into the growable table if there's been an event active that gets emitted through the ETW callback". Looking at the code I don't think this requires sending an event, it just requires that the EventSource has been turned on with relevant keywords. Is there some dependency on an event getting sent that I missed?

Sorry - yes. You need to enable jit rundown rather than necessarily send an event per se.

Publishing of the growable function tables has race conditions. There are small windows where the unwind is not published correctly. These race conditions are ok for diagnostic (there are number of other corner cases where the diagnostic stackwalk does not work), but they would not be acceptable for functionality (it would not be ok for you app to crash intermittently). If we were to switch to growable function tables as the only source of unwind info, we would need to do something about this.

Agree that switching. At some point, getting the information off the lists where it's at might me interesting. However, at first I'd focus on being able to emit the . Do you fear that would be too big an overhead?

Why would we want to do that? Arm64 uses different stackwalking strategy. The unwindinfo is not necessary to get good diagnostic stacktrace on Arm64.

In general WinDBG et al use these as streams to be able to find the exception unwind sequence. If other methods are implemented is something I am not familiar with at the moment, I'd have to reach out.

Can you share the link to the docs that says this? I think it is perfectly fine to use RtlAddFunctionTable when the usage pattern matches the API design.

Looks like the push away from it is mostly related to ARM64EC which we don't support: https://learn.microsoft.com/en-us/windows/arm/arm64ec-abi#dynamically-generating-jit-compiling-arm64ec-code. I can't find another documented reason to prefer growable over non-growable. And since the exception table in r2r is constant, I don't see why we should go the growable path.

That being said, diagnostic tools for any released version of windows won't be able to analyze non-FD-msi runtimes. The only way to do it right now is via a full dump with the DAC. This was never an issue with framework. At the very least we should document the workaround for self-contained apps since most tools will end up relying on stack unwinding heuristics otherwise.

dotnet / runtime

Unwind info not published in growable function tables #57962

Configuration

endif // defined(TARGET_AMD64