Support for process dumping of native and managed code (C++ and C#)

dotnet / diagnostics

This repository contains the source code for various .NET Core runtime diagnostic tools and documents.

MIT License

1.18k stars 354 forks source link

Support for process dumping of native and managed code (C++ and C#) #151

Open tapika opened 5 years ago

tapika commented 5 years ago

When exception occurs in application, it should be possible to dump it's state, including native and it's managed counterparts. (C++ and C#), and restore it back later for debugging.

noahfalk commented 5 years ago

Are you refering to creating a dump or core file? If so you have a few options:

On windows - call the win32 API MiniDumpWriteDump() or use any of the windows tools that capture dumps such as ProcDump, Visual Studio Debugger, windbg, or the OS Process Manager. On Linux - use OS mechanisms for your distro, the CreateDump utility or the soon to be previewed dotnet dump tool

Is that what you were looking for?

[Edit]: Specifically for exceptions, watson or a debugger like Visual Studio can help on Windows. On Linux the CreateDump utility has environment variables that will automatically collect crash dumps when an unhandled exception occurs.

tapika commented 5 years ago

I would like to use API's like this: https://github.com/Microsoft/clrmd

But main problem is that they do not support native C++ code. It would be perfect if API could be made in native/managed transparent mode, so it does not matter if it's native code or managed code - from client side you operate using same API's, which in a turn can call MiniDumpWriteDump() or whatever.

tapika commented 5 years ago

I guess at the moment I'm looking only solution for Windows / 64-bit architecture only, other OS support could wait, but eventually API's could become portable.

tapika commented 5 years ago

MiniDumpWriteDump - guess something similar also needs to be supported for managed side as well.

Basically use case is that unknown exception (C# Exception or C++ any exception or page fault) occurs in your application. You call DumpWrite function. It saves native C++ and managed C# state completely. Then you get that dump file later on to yourself for analysis / debugging, restore it in debugger and analyze what went wrong.

noahfalk commented 5 years ago

MiniDumpWriteDump - guess something similar also needs to be supported for managed side as well.

I'm not sure what you mean here. You can call MiniDumpWriteDump on a process that is running .Net Core. Unless you are trying to capture a triage dump on Windows OS older than TH3 I'd expect this to work well.

However there would be a different issue in the scenario you described, MiniDumpWriteDump is not designed to take a dump of the same process it is executing within. Self-dumping is inherently problematic and developers have historically resolved that problem by using a 2nd process. For example when an application crashes and triggers Watson, the Watson service launches a process called WerFault.exe to collect the crash dump. If the exceptions you want to capture are crashing your application you could always use Watson's local dump feature. If you are trying to create dumps of exceptions that do not go unhandled then you probably want to call CreateProcess() to launch a helper process and get that helper process to call MinidumpWriteDump on its parent.

I would like to use API's like this: https://github.com/Microsoft/clrmd But main problem is that they do not support native C++ code.

You could either use an interop layer (COM or reverse p/invoke) to call CLRMD using C++ code, or there is another API called ICorDebug which is written in C++. The APIs aren't identical, but you can accomplish similar things with them. ICorDebug is unfortunately not as well documented and requires a quite a bit more effort to use so many people prefer CLRMD, but it is a tested and working option if using a C++ library is a critical factor for you.

It would be perfect if API could be made in native/managed transparent mode

Thanks for the feedback! Practically speaking we don't have anything like this in the works right now so hopefully one of the options above will still suffice.

tapika commented 5 years ago

However there would be a different issue in the scenario you described, MiniDumpWriteDump is not designed to take a dump of the same process it is executing within.

I think it makes sense to have WerFault.exe as special service or host, which can be used to collect process state and information about it. API must be usable from native C++ and from managed C# side, and must be well designed in both cases. (Simple to use, but having technical mechanism to perform all execution scenarios)

If you are trying to create dumps of exceptions that do not go unhandled then you probably want to call CreateProcess() to launch a helper process and get that helper process to call MinidumpWriteDump on its parent.

I'll try to return back to exception and their handling later on - I have some ideas about them.

But in here I would like to halt program and freeze it's execution at that very moment of time. All threads and their data must be stopped as well, and memory and execution state must be dumped. It should be possible to return back to that application state later on by loading dump file on developer's machine.

You could either use an interop layer (COM or reverse p/invoke) to call CLRMD using C++ code, or there is another API called ICorDebug which is written in C++. The APIs aren't identical, but you can accomplish similar things with them.

Basically I don't want to touch any API or design which is either native or managed code specific - we have application which consist code of both worlds, and I think both worlds need to walk hand by hand in design, API, easiness to use, cooperating with each other. I do understand that both worlds are quite far from each other, but one approach is to improve both world simultaneously, driving in best design and API practices.

ICorDebug is unfortunately not as well documented

This is first indication of API, which will vanish next time you come to update it's functionality - I would say it's bit risky to use it.

We could make this also so that I will start to write code according to your instructions, using API's proposed by you - I'll try to simplify solution, API's, and make implementation native/managed code neutral - but newly written .dll's could be transferred under Microsoft responsibility.

Also if there will be heavier problems with design, implementation, or windows API itself, you could drive in necessary changes into windows as well.

noahfalk commented 5 years ago

API must be usable from native C++ and from managed C# side

Perhaps I misunderstood what your goals were. I thought you were trying to get information about APIs and tools that already existed, but now I think you are trying to design a new API. Is that correct?

This is first indication of API, which will vanish next time ...

Despite having limited documentation, ICorDebug API has existed since .NET Framework 1.0, nearly 20 years. We've added to it over time, but once the APIs exist they are stable and supported. This API is also used extensively by Visual Studio debugger. There may be other reasons you choose to not to use this API, but I don't think you need to worry about stability or lack of support.

We could make this ... but newly written .dll's could be transferred under Microsoft responsibility

If you are looking for the folks at Microsoft to collaboratively develop a new feature or to take responsibility for its maintainence, the first step in that process would defining clearly what the use case and why the existing solutions aren't suitable. Above you described some properties of the API you are looking for, but not so much the use-case that would require it to be that way. For example lots of users already can use Watson to get a crash dump for unhandled exceptions and its not clear to me that they would want to use any API even if we gave them an amazing one. I think there are probably other scenarios where this could be useful, but its good to have everyone on the same page about exactly what the goal is and what problems are being solved.

Just to be transparent, I doubt this is a project our team would take on right now unless we saw a large community interest, but we're glad to explore, to get feedback, and to provide guidance that might help you make forward progress even if we don't take ownership of it.

tapika commented 5 years ago

Perhaps I misunderstood what your goals were. I thought you were trying to get information about APIs and tools that already existed, but now I think you are trying to design a new API. Is that correct?

I suspect it should be new API. It's possible indeed that you can you 2-X existing API's, but from my perspective it should be hidden / non-visible to end-user.

Despite having limited documentation, ICorDebug API has existed since .NET Framework 1.0, nearly 20 years.

I have tried to make mixed mode call stack walker, using more than 3 API's altogether - that solution lasted for 3 years before was broken. I think if complexity goes over 3 API's - better to create one new (which can recombine 3 or more existing - but invisible to end-user). Also ICorDebug is native only API. I'm interested in supporting both - managed and native code.

Above you described some properties of the API you are looking for, but not so much the use-case that would require it to be that way. For example lots of users already can use Watson to get a crash dump for unhandled exceptions and its not clear to me that they would want to use any API even if we gave them an amazing one.

You're right. I have raised another issue in here: https://github.com/dotnet/diagnostics/issues/152

Let's start from that one - I suspect that in order to dump mixed mode application exception, you need to catch it first - but after that you can either attach debugger to it (wait until debugging starts) or perform whole application dump.

For dumping application whole watson could be used theoretically - but:

where is watson API itself - not yet available for public use ?
does it supports managed code ?

noahfalk commented 5 years ago

where is watson API itself - not yet available for public use ?

The primary use-case for Windows Error Reporting doesn't utilize any API, your .NET app throws an unhandled exception, the OS unhandled exception filter is called, and Windows Error Reporting is automatically invoked to capture a dump. There are some other APIs such as WerReportCreate and here is a random web page I found which showed invoking it from managed code.

does it supports managed code ?

The primary use-case where the app crashes definitely works with managed code, we see lots of crash reports that come back to Microsoft with .Net Core unhandled exceptions in them. The support for .Net Core minidumps was added in Windows 10 TH3 I believe, so virtually any Windows 10 user should have that as this point. I haven't personally tested the use cases that explicitly call the ReportCreate API and its much less common, but I don't know of a reason it wouldn't work.

While thinking about this last night I did come up with a couple dump related use-cases that might be interesting for .Net runtime to support in the future. I don't know if any of these sounds relevant to your personal goals but throwing it out there:

1) The dotnet-dump tool we are about to preview currently requires sudo permission to use it on Linux. We could create a cooperative option where the tool sends a command to the app asking it to dump itself so that it would no longer require sudo. This makes it easier to use in locked down environments such as cloud services or containers. 2) The CreateDump tool is currently Linux only and it might be useful to have a fully portable version of the same experience. In particular you would set an environment variable when running the app on any platform and then when it crashed you would get a dump. This isn't huge value, but its a little nice not to have to tell people how to configure each OS'es dump mechanism separately. 3) For non-fatal exceptions or asserts where the user wants to capture the state of the process to debug offline, they could have a managed API to call that produces a dump. On Linux there is no OS capability to do this that I am aware of, and on Windows WER might be an option. The value here would be that the user can write a single platform neutral managed API call in their code and not have to worry about coding different platform specific solutions.

tapika commented 5 years ago

For 1 & 2 - we are looking for windows support at the moment, can this be ported to windows ?

3 - needs to be analyzed.

Found also this one:

https://chromium.googlesource.com/crashpad/crashpad/+/master/doc/overview_design.md

Need to analyze if that one supports managed code.

tapika commented 5 years ago

Here is another alternative as well:

https://github.com/backtrace-labs/crashpad/tree/backtrace

noahfalk commented 5 years ago

For 1 & 2 - we are looking for windows support at the moment, can this be ported to windows ?

I was proposing possible future use-cases that might justify the runtime team to maintain an API that seemed similar to what you were proposing. None of these are work that we currently have scheduled.

I think you are asking slightly differently if dotnet-dump / CreateDump can be supported on Windows. For dotnet-dump yes we plan to support it on Windows. For CreateDump there is no technical limitation, but it is not a priority for us right now.

tapika commented 5 years ago

Can you check chromium crashpad - I think it looks good as well. I've managed to build it relatively easily on windows, besides linux support there is also macos & android (experimental) support. Wondering if it makes any sense to upgrade that one to support managed code as well:

https://chromium.googlesource.com/crashpad/crashpad/+/HEAD/doc/developing.md

Download via link depot_tools, extract it, set PATH to pinpoint to it.

C:\PrototypingQuick\CrashPad\crashpad3\crashpad>set PATH=%PATH%;C:\PrototypingQuick\CrashPad\depot_tools

Fetch crashpad & dependencies from git:

C:\PrototypingQuick\CrashPad\crashpad3> fetch crashpad

C:\PrototypingQuick\CrashPad\crashpad3> cd crashpad

C:\PrototypingQuick\CrashPad\crashpad3\crashpad>gn gen out/Default
Done. Made 80 targets from 27 files in 12467ms

Generate visual studio projects:

C:\PrototypingQuick\CrashPad\crashpad3\crashpad>gn gen out\mybuild --ide=vs
Generating Visual Studio projects took 246ms
Done. Made 80 targets from 27 files in 100169ms

Build everything:

C:\PrototypingQuick\CrashPad\crashpad3\crashpad>ninja -C out/Default
ninja: Entering directory `out/Default'
[481/481] LINK crashpad_client_test.exe

tapika commented 5 years ago

Can catch also dump files and can open them using Visual studio.

tapika commented 4 years ago

Self-dumping is inherently problematic and developers have historically resolved that problem by using a 2nd process. For example when an application crashes and triggers Watson, the Watson service launches a process called WerFault.exe to collect the crash dump.

I'm not sure why self-dumping is considered as problematic. I guess main problem is to catch the exception originally, then process dumping can be performed. I would guess that main problem is SetUnhandledExceptionFilter - it works only for native C++ calls. For mixed mode such API is not available. Luckily I've managed to code now mechanism to catch exception independently from whether it's native C++ or mixed mode C++ - implementation now is located in here:

https://github.com/tapika/stacktrace/blob/develop/src/exception_handler.cpp

SetUnhandledExceptionFilter indeed cannot be used as such, I've used minhook to intercept kernelbase function. (Would be better to have it as separate windows api function maybe).

But after we managed to catch an exception, self-dumping will be also possible, something like this is for example performed by dmchook4:

https://github.com/muhopensores/dmc4_hook/blob/master/src/dmc4_hook/utils/crash_handler.cpp

In similar manner to my boost/stacktrace code it will use minhook to intercept exception callbacks.

So main problem is (as described in my own implementation): Using also MH_EnableHook instead of SetUnhandledExceptionFilter(&UnhandledExceptionFilter_Detour); to be able to debug same function.

But process dumping might be more difficult, as MiniDumpWriteDump will work only for native C++, for mixed mode need to perform somehow different kind of process dumping.

Do you see any other problem than mentioned above ?

One more problem I have not yet tackled - is stackoverflow exception for mixed mode c++, process simply dies after that exception occurs, suspect need even to hook AddVectoredExceptionHandler to catch that one, but it leads more complexity, won't go in there any deeper for now.

Still have a mixed mode c++ call stack to resolve, and process dumping indeed looks more difficult.

bruno-garcia commented 3 years ago

I created a .NET library that bundles crashpad to create a minidump of the .NET process on a native crash.

Works on macOS, Linux and Windows.

So far it doesn't show managed frames though, so I'm looking here how I could use sos in symbolic to include the managed frames in the stack trace. Maybe something folks want to collaborate on? Or at least give me some pointers?

https://github.com/getsentry/sentry-dotnet-minidump

noahfalk commented 3 years ago

So far it doesn't show managed frames though, so I'm looking here how I could use sos in symbolic to include the managed frames in the stack trace

In order to convert instruction pointers into symbolic method names you need to do two translations:

instruction pointer -> MethodDef metadata token
MethodDef metadata token -> method name

SOS, CLRMD, windbg, and VS can all do it, but the right portions of process memory have to be present in the dump file. If you are using an off-the-shelf dump generation tool there is a good chance that tool has no understanding of how to locate the relevant memory needed because the logic is specific to the exact version of .NET runtime being used. You can side step this problem by capturing all virtual memory but this gives you very large dumps.

All of the simple ways I know of to capture the right set of memory in your dump involve using the tools that the .NET team built:

You could set the environment variable COMPlus_DbgEnableMiniDump=1 More Info
You could invoke dotnet-dump to capture the dump for you. dotnet-dump knows how to parse the specific details of .NET runtime data structures.
You could invoke the WriteDump() API in Microsoft.Diagnostics.NetCore.Client. This is the API that dotnet-dump uses to do all its work. This is a public API in a Microsoft supported NuGet library.
On Windows only you can invoke MiniDumpWriteDump which has an extensibility mechanism that CLR uses
[A not recommended and labor intensive option] You could reverse engineer how WriteDump() works (its all OSS after all) allowing you to extract just the logic that tells you which memory you need to capture for this purpose and then integrate it with an alternate dump generation tool

If you already have a dump that has the right memory captured, converting to an IP to a string name for the function is fairly easy. In CLRMD invoke ClrRuntime.GetMethodByInstructionPointer().

In terms of collaboration I'm glad to try answering questions and getting you pointed in the right direction : ) More contribution beyond that largely depends what the goals for the project are. At this point the library sounds quite similar to support we already have in the WriteDump() API so I'd want to understand if there are advantages to the new library relative to what already exists and works.

HTH! -Noah

tapika commented 3 years ago

Quite many different things happened about which I haven't mentioned in here.

Basically exception handling which I've coded does work, but there are exception from rules, namely my own C++ exception handler catches C# exception, then re-throw occurs, for example:

try{
   throw new Exception("test");

} catch( Exception ex)
{
    throw ex;
}

This is catched by my C++ exception handling, not .net framework. (Other exception types should be ok - see https://github.com/tapika/stacktrace/blob/develop/example/csharp_crashy_window.xaml.cs#L51 )

32-bit exception handling also does not work, there is a need to hook different functions.

(Copied from: https://github.com/dotnet/runtime/issues/12405#issuecomment-646647202)

I've left code in commented state, but my goal is to support 64-bit / windows, so don't care so much about 32-bit support. If will manage to make it work - then will make it work, if not - ok for me too.

See https://github.com/tapika/stacktrace/blob/develop/src/exception_handler.cpp#L206

I did not finish managed call stack determination support, but what I have analyzed -

https://github.com/microsoft/clrmd

can do same things as C++ can, only C# calls windows functions via invoke.

Theoretically call stack reconstruction could be done even from C#. What I have protoed using boost stacktrace + coreclr is possible.

I have proposed for boost::stacktrace developer / Antony Polukhin to detach stack trace from boost library itself and make a standalone library, but he did not reply on this one.

But one approach is to completely abandon boost::stacktrace and re-write everything in C#.

But I haven't analyzed any deeper - at the moment my own boost::stacktrace + my exception handling is active in our own tests, but not sure if it's good approach in overall.

But besides exception handling it can halt application from crashing. (I'm translating native C++ exception to managed .net exception, and application will only display message, but not crash)

We had quite long conversation with coreclr developer on this subject, if you want to read: https://github.com/microsoft/clrmd/issues/847

Meanwhile - in our code I've observed exceptions happening from both - from C++ and from C# code.

To have more strict control over C# code, I've also altered xunit, see brief discussion in here:

https://github.com/xunit/xunit/discussions/2213

Please let me know if there is something I can do for you.

Uff.... I hope my message did not blow your mind. :-)