dotnet / diagnostics

This repository contains the source code for various .NET Core runtime diagnostic tools and documents.
MIT License
1.19k stars 356 forks source link

dotnet-dump "mini" dumps do not include enough data to debug managed process dumps #3065

Open leculver opened 2 years ago

leculver commented 2 years ago

Description

In order to properly debug a crash dump generated by single-file exes, we need more data from the executable's PE image. Specifically we need to read the export table to know if any given module is a single file module, e.g.:

https://github.com/leculver/clrmd/blob/25dea09efb3756ec46f4676fee340067ab85b228/src/Microsoft.Diagnostics.Runtime/src/DataTargets/DataTarget.cs#L190

When the PE image data is not present in the dump we will not be able to know whether a particular image is single file or not. dotnet-dump will need to detect the single file case and add enough of the PEImage header so that we can walk the export table and get the runtimeinfo.

Additionally the section of memory for the RuntimeInfo needs to be in the dump as well.

Repro

Build this project: https://github.com/microsoft/clrmd/tree/master/src/TestTargets/single-file

With this command line: dotnet publish -c Release -o publish -p:PublishReadyToRun=true -p:PublishSingleFile=true -p:PublishTrimmed=true --self-contained true -p:IncludeNativeLibrariesForSelfExtract=true

Then use dotnet-dump --type Mini to create a crashdump.

Then load the crash dump with this code: DataTarget.LoadDump(@"C:\path\to\crash.dmp").ClrVersions[0].CreateRuntime();.

This will crash with an index out of range exception, with ClrVersions.Length == 0, which is due to not being able to create a Microsoft.Diagnostics.Runtime.PEImage due to no data being in the dump for that image. Enough data needs to be present in the dump such that we succeed to read the export table.

leculver commented 2 years ago

FULL dump:

0:000> lmm single_file
Browse full module list
start end module name
00007ff7`3ed90000 00007ff7`3f6ba000 single_file C (export symbols) single-file.exe

0:000> .dumpdebug
[snip]
101 027C999B 00007ff7`3ed90000 00000000`00001000

MINI dump:

0:000> lmm single_file
Browse full module list
start end module name
00007ff7`3ed90000 00007ff7`3f6ba000 single_file (deferred)

0:000> .dumpdebug
[snip]

2537 0004C349 0000007c`6e0ff248 00000000`00000db8
2538 0004D101 00007ffb`cc32d344 00000000`00000100 <-- Nothing in 7ff7* range
2539 0004D201 00007ffb`cc32d814 00000000`00000100
2540 0004D301 0000007c`6df7f968 00000000`00000698
hoyosjs commented 2 years ago

If the singlefile app doesn't have a DAC next to it, it needs to get extracted from the host - we'd need make sure it's there, it's getting extracted, and we'd need to get a BP here https://github.com/dotnet/runtime/blob/9d972eda4da654dfafae101e9850b1191a4dcbf9/src/coreclr/debug/daccess/enummem.cpp#L1535. This needs a recent enough version of dbghelp to work.

mikem8361 commented 2 years ago

Yes, this feels like the DAC isn't be loaded when the dump is generated (it shouldn't matter whether dotnet-dump/createdump or windbg is used to generate it).

leculver commented 2 years ago

This was generated with:

c:\git\clrmd>dotnet-dump --version 5.0.236902+5366510b270d0f0ebffc9e36e9496688b20b96c2

It's possible the dac wasn't loaded, but it feels strange that this is the case. The dac is embedded in the single_file.exe, I did double check that. I would have imagined that dotnet-dump (or the part of CLR that does the dumping) would do the right thing here. We definitely need to dig deeper to see what's going on, something is busted with the "out of the box" single file minidump scenario, I'm just not sure where the issue is.

hoyosjs commented 2 years ago

dbgcore comes from the machine and needs to have support for that embedded DAC. We don't redistribute it, and it does lead to pits of failure like this one - it could be interesting debating redistributing this, as long as we have a mechanism to service it in a timely manner as needed. We'd need to ask Tim and folks what their thoughts are.

leculver commented 2 years ago

At Juan's suggestion I added dbgeng.dll and dbgcore.dll next to dotnet-dump which changed the behavior but it still fails.

.dumpdebug shows that we put a little extra bytes into the dump:

2710 0019A418    00007ff7`3ed90000   00000000`00000040
2711 0019A458    00007ff7`3ed900f8   00000000`00000108

Full reads are below, but at its core we fail to read memory in this callstack:

    Microsoft.Diagnostics.Runtime.dll!Microsoft.Diagnostics.Runtime.Windows.UncachedMemoryReader.Read(ulong address, System.Span<byte> buffer) Line 59  C#
    Microsoft.Diagnostics.Runtime.dll!Microsoft.Diagnostics.Runtime.MinidumpReader.Read(ulong address, System.Span<byte> buffer) Line 132   C#
    Microsoft.Diagnostics.Runtime.dll!Microsoft.Diagnostics.Runtime.ReadVirtualStream.Read(byte[] buffer, int offset, int count) Line 47    C#
    System.Private.CoreLib.dll!System.IO.BinaryReader.ReadBytes(int count)  Unknown
    System.Reflection.Metadata.dll!System.Reflection.PortableExecutable.PEBinaryReader.ReadNullPaddedUTF8(int byteCount)    Unknown
    System.Reflection.Metadata.dll!System.Reflection.PortableExecutable.SectionHeader.SectionHeader(ref System.Reflection.PortableExecutable.PEBinaryReader reader) Unknown
    System.Reflection.Metadata.dll!System.Reflection.PortableExecutable.PEHeaders.ReadSectionHeaders(ref System.Reflection.PortableExecutable.PEBinaryReader reader)    Unknown
    System.Reflection.Metadata.dll!System.Reflection.PortableExecutable.PEHeaders.PEHeaders(System.IO.Stream peStream, int size, bool isLoadedImage)    Unknown
    System.Reflection.Metadata.dll!System.Reflection.PortableExecutable.PEReader.InitializePEHeaders()  Unknown
    System.Reflection.Metadata.dll!System.Reflection.PortableExecutable.PEReader.PEHeaders.get()    Unknown
>   Microsoft.Diagnostics.Runtime.dll!Microsoft.Diagnostics.Runtime.Utilities.PEImage.PEImage(System.IO.Stream stream, bool leaveOpen, bool isVirtual) Line 106 C#

Not enough of the PE header was written into the dump, and presumably not the exports or the resource table. I haven't put dotnet-dump under a debugger yet, but I have to leave for the day.

Read - 7ff73ed90000 bytes:2 - 2
Read - 7ff73ed9003c bytes:4 - 4
Read - 7ff73ed900f8 bytes:4 - 4
Read - 7ff73ed90000 bytes:2 - 2
Read - 7ff73ed9003c bytes:4 - 4
Read - 7ff73ed900f8 bytes:4 - 4
Read - 7ff73ed900fc bytes:2 - 2
Read - 7ff73ed900fe bytes:2 - 2
Read - 7ff73ed90100 bytes:4 - 4
Read - 7ff73ed90104 bytes:4 - 4
Read - 7ff73ed90108 bytes:4 - 4
Read - 7ff73ed9010c bytes:2 - 2
Read - 7ff73ed9010e bytes:2 - 2
Read - 7ff73ed90110 bytes:2 - 2
Read - 7ff73ed90112 bytes:1 - 1
Read - 7ff73ed90113 bytes:1 - 1
Read - 7ff73ed90114 bytes:4 - 4
Read - 7ff73ed90118 bytes:4 - 4
Read - 7ff73ed9011c bytes:4 - 4
Read - 7ff73ed90120 bytes:4 - 4
Read - 7ff73ed90124 bytes:4 - 4
Read - 7ff73ed90128 bytes:8 - 8
Read - 7ff73ed90130 bytes:4 - 4
Read - 7ff73ed90134 bytes:4 - 4
Read - 7ff73ed90138 bytes:2 - 2
Read - 7ff73ed9013a bytes:2 - 2
Read - 7ff73ed9013c bytes:2 - 2
Read - 7ff73ed9013e bytes:2 - 2
Read - 7ff73ed90140 bytes:2 - 2
Read - 7ff73ed90142 bytes:2 - 2
Read - 7ff73ed90144 bytes:4 - 4
Read - 7ff73ed90148 bytes:4 - 4
Read - 7ff73ed9014c bytes:4 - 4
Read - 7ff73ed90150 bytes:4 - 4
Read - 7ff73ed90154 bytes:2 - 2
Read - 7ff73ed90156 bytes:2 - 2
Read - 7ff73ed90158 bytes:8 - 8
Read - 7ff73ed90160 bytes:8 - 8
Read - 7ff73ed90168 bytes:8 - 8
Read - 7ff73ed90170 bytes:8 - 8
Read - 7ff73ed90178 bytes:4 - 4
Read - 7ff73ed9017c bytes:4 - 4
Read - 7ff73ed90180 bytes:4 - 4
Read - 7ff73ed90184 bytes:4 - 4
Read - 7ff73ed90188 bytes:4 - 4
Read - 7ff73ed9018c bytes:4 - 4
Read - 7ff73ed90190 bytes:4 - 4
Read - 7ff73ed90194 bytes:4 - 4
Read - 7ff73ed90198 bytes:4 - 4
Read - 7ff73ed9019c bytes:4 - 4
Read - 7ff73ed901a0 bytes:4 - 4
Read - 7ff73ed901a4 bytes:4 - 4
Read - 7ff73ed901a8 bytes:4 - 4
Read - 7ff73ed901ac bytes:4 - 4
Read - 7ff73ed901b0 bytes:4 - 4
Read - 7ff73ed901b4 bytes:4 - 4
Read - 7ff73ed901b8 bytes:4 - 4
Read - 7ff73ed901bc bytes:4 - 4
Read - 7ff73ed901c0 bytes:4 - 4
Read - 7ff73ed901c4 bytes:4 - 4
Read - 7ff73ed901c8 bytes:4 - 4
Read - 7ff73ed901cc bytes:4 - 4
Read - 7ff73ed901d0 bytes:4 - 4
Read - 7ff73ed901d4 bytes:4 - 4
Read - 7ff73ed901d8 bytes:4 - 4
Read - 7ff73ed901dc bytes:4 - 4
Read - 7ff73ed901e0 bytes:4 - 4
Read - 7ff73ed901e4 bytes:4 - 4
Read - 7ff73ed901e8 bytes:4 - 4
Read - 7ff73ed901ec bytes:4 - 4
Read - 7ff73ed901f0 bytes:4 - 4
Read - 7ff73ed901f4 bytes:4 - 4
Read - 7ff73ed901f8 bytes:4 - 4
Read - 7ff73ed901fc bytes:4 - 4
Read - 7ff73ed90200 bytes:8 - 0
Read - 7ff73ed90200 bytes:4 - 0
leculver commented 2 years ago

So I think there may be two issues here:

  1. There's definitely a dbgcore/dbgeng dependency problem. That will need to be solved since the dac isn't being loaded or called.
  2. When we have dbgeng/dbgcore ClrMD still chokes on the pe header. We can't parse it using System.Reflection.Metadata. If we actually do have the bare minimum needed to get the right data out of this file I may have to bring back the hand-parser for PE images, which is annoying but not too bad. I'm not convinced all the right data is there though.

I will leave 1) to you all.

I will dig further into 2) above tomorrow and see exactly what we put into the dump when those dlls exist and if it's the right data I'll see if we can re-write ClrMD to parse it successfully.

Also please note that in the original bug the user reported that "WinDbg and SOS work fine", this is likely due to running it on the same PC as the dump and WinDbg smartly memory maps the local image from disk. ClrMD should do that too (and we'll fix it) but there's still a bug here. Moving the dump to a different PC (or deleting that .exe from disk that WinDbg is memory mapping) would reproduce the problem too I think.

hoyosjs commented 2 years ago

WinDBG does know how to get a very thin slice of the export for this purpose.

mikem8361 commented 2 years ago

Juan, doesn't dbghelp.dll need to be side-by-side with the dotnet-dump exe? It does a pinvoke to MiniDumpWriteDump in dbghelp.dll. dbgeng/dbgcore are also needed too.

Lee, I have also had similar problems with using the System.Reflection.Metadata. SOS only uses it on an actual module file. Otherwise, SOS uses the symstore PEFile reader which doesn't read the whole header and all the data directories up front. It only reads exactly what is necessary. It is very lazy.

leculver commented 2 years ago

Ok I've eliminated 2) as an issue. Replacing System.Reflection.Metadata with my own PEImage reader resulted in correctly loading the image.

However, once I got further it's clear that not enough data is put into the dump, presumably the issue with 1) is that the dac wasn't loaded/called. Happy to help further here if there's other issues suspected with ClrMD, but I go back to the original issue: Not enough data is put into the dump when you run dotnet-collect --type Mini on a single file exe.

Whether that's simply because dbg*.dll isn't present or other issues I leave up to you all. =)

leculver commented 2 years ago

The change to the title of this bug does not reflect reality, I've changed it back.

Dotnet-dump creates crash dumps. If those crash dumps don't contain enough data to debug the underlying crash, then this is a bug. Dotnet-dump isn't doing what it clams to do (generate debuggable crash dumps).

It's possible that you could solve this by shipping dbgeng/dbgcore along with dotnet-dump. That's a possible solution, but it's not the issue that I'm reporting.

mikem8361 commented 2 years ago

It is against Windows policy to ship/redistribute the fixed dbghelp/dbgcore/dbgeng with dotnet-dump. This fix will shipping in a future Windows service release.

mikem8361 commented 2 years ago

DbgHelp Versions - Win32 apps | Microsoft Learn