Closed jkotas closed 7 months ago
Here is what I would propose for the error handler if we want to pass all that could be useful.
There are many optionals. That is just the nature of this API, since there are many reasons to fail - FailFast, HW faults, exceptions, asserts, ... I am not sure if stashing arguments in a struct will make it much better.
// returns false if default handler is not desired.
extern "C" DLL_EXPORT bool __cdecl FatalErrorHandler(
uint32_t exitCode, // Ex: COR_E_FAILFAST, COR_E_STACKOVERFLOW,...
void* address = NULL,
#ifdef WINDOWS
PEXCEPTION_POINTERS pExceptionInfo = NULL,
#else
int sig = 0,
siginfo_t* info = NULL,
void* ucontext = NULL,
#endif
LPCWSTR argExceptionString=NULL, // stack trace
LPCWSTR message=NULL, // Ex: "SSE and SSE2 processor support required."
LPCWSTR detailMessage=NULL // Ex: "detailMessage that is passed to Assert(condition, message, detailMessage)"
LPCWSTR errorSource=NULL // Ex: "Assertion Failed"
)
{
// native implementation with signal handler restrictions
}
As for what to do with signals if there are previous handlers - our current policy is to:
Since our crash dumping happens after calling the previous helper, I think the overridden helper should run after as well. Basically - we should call the helper between steps 2 and 3, and if it returns false, we do not do 3.
I am not sure if stashing arguments in a struct will make it much better.
I think we want to have a scheme where we can add more pieces of data in future. Passing the information as individual arguments in the signature makes it difficult. Instead, I think the argument should be a pointer to a struct and the first field of the struct should indicate the size. It will allow us to add more as necessary without breaking changes.
Some more comments about the details:
address
- where does this address point to?signo
is part of siginfo_t
so it is not strictly necessary to pass it as a separate field. It can be read from siginfo_t
.argExceptionString
- if it is a stacktrace, should it be rather called stacktrace? Also, I am wondering whether it would be better to provide the stacktrace via enumerator-like callback. #ifdef WINDOWS
PEXCEPTION_POINTERS pExceptionInfo = NULL,
#else
int sig = 0,
siginfo_t* info = NULL,
void* ucontext = NULL,
#endif
The signature should be platform agnostic.
An example I would expect would be the following. Note the update return value is BOOL
as opposed to C++ or C99's bool
.
struct ExceptionData
{
size_t size;
// ...
// I would avoid Win32 data structures as fields.
// Instead users can cast as needed, that should be contained within documentation.
// ...
};
// returns false if default handler is not desired.
extern "C" DLL_EXPORT BOOL __cdecl FatalErrorHandler(int32_t error_code, struct ExceptionData* data);
I think we want to have a scheme where we can add more pieces of data in future. Passing the information as individual arguments in the signature makes it difficult.
Passing as individual arguments was an attempt to leave door open to adding more parameters. I did not think of a struct including its size, that would work too.
address - where does this address point to?
For faults it is the address of a fault. We have other cases (like FailFast) where we just call GetCurrentIP().
It is basically a location in some general sense responsible for the crash.
signo is part of siginfo_t so it is not strictly necessary to pass it as a separate field. It can be read from siginfo_t.
We have cases where we specify signo, but not the siginfo_t. https://github.com/dotnet/runtime/blob/c1f91610c53681c7069ae1c1502a88e2644f2543/src/coreclr/nativeaot/Runtime/unix/PalCreateDump.cpp#L447
Not sure if all the uses of this functionality make sense. I just tried to fit into existing patterns, thinking most are there for a reason.
argExceptionString - if it is a stacktrace, should it be rather called stacktrace? Also, I am wondering whether it would be better to provide the stacktrace via enumerator-like callback.
HandleFatalError
calls it argExceptionString
.
I think the only user is the Environment_FailFast
and it fills it with what someException.ToString()
produces.
https://github.com/dotnet/runtime/blob/c1f91610c53681c7069ae1c1502a88e2644f2543/src/coreclr/classlibnative/bcltype/system.cpp#L168
So this is typically a stack trace, but ToString()
can be overridden to make it not a stack trace.
Also that calls managed code, but at some point in the process of crashing we cannot call managed code any more. I guess we could still obtain and stash the stack trace as a bunch of native strings and give user an iterator, but if the most use cases will just put that all in one message or a binary dump anyways, maybe a string is good enough?
Note the update return value is BOOL as opposed to C++ or C99's bool.
Good catch!
So the signature could be:
// returns false if default handler is not desired.
extern "C" DLL_EXPORT BOOL __cdecl FatalErrorHandler(int32_t error_code, struct ExceptionData* data);
Now we need to settle on how ExceptionData
looks like
@AaronRobinsonMSFT Re: I would avoid Win32 data structures as fields.
Is that about PEXCEPTION_POINTERS
? I think it would be ok on Windows.
As for LPWSTR
- yes, we need something else. Perhaps just char*
on Unix, or maybe on both Windows and Unix? UTF-8?
Is that about PEXCEPTION_POINTERS? I think it would be ok on Windows.
I agree, but then the declarations become muddled and we end up requiring people to define complex structures. If we instead make the data structures simple the declarations are easy to copy/paste and users can simply cast in the implementation if needed. One could imagine something akin to the following:
struct ExceptionData
{
size_t size;
void* Context; // Cast to PEXCEPTION_POINTERS on Windows or ucontext_t* on non-Windows.
// Define other fields and how they should be used.
};
int32_t error_code
What kind of error_code
is this?
Cast to PEXCEPTION_POINTERS on Windows or ucontext_t* on non-Windows.
EXCEPTION_POINTERS is bundle of EXCEPTION_RECORD and CONTEXT. It corresponds to siginfo_t and ucontext_t. We may want to split it into two fields like this:
struct ExceptionData
{
...
void* info; // Cast to PEXCEPTION_RECORD on Windows or siginfo_t* on non-Windows.
void* ucontext; // Cast to PCONTEXT on Windows or ucontext_t* on non-Windows.
...
};
ExceptionData
We should call it FatalErrorData
(or FatalErrorInfo
) to match SetFatalErrorHandler
.
int32_t error_code
What kind of error_code is this?
The error codes are from:
C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\Include\um\CorError.h
Ex: COR_E_FAILFAST
, COR_E_STACKOVERFLOW
I would call it int32_t hresult
then (to match Exception.HResult
that is the same kind of error code).
What do we do with strings, exception strings in particular?
if hostfxr as an example, the strings should be const pal::char_t*
const pal::char_t* argExceptionString=NULL, // whatever exception.ToString() produced.
const pal::char_t* message=NULL, // Ex: "SSE and SSE2 processor support required."
const pal::char_t* detailMessage=NULL // detailMessage that is passed to Assert(condition, message, detailMessage)
const pal::char_t* errorSource=NULL // Ex: "Assertion Failed"
The most contentious is argExceptionString:
Exception.StackTrace
returns) or array of strings, or some kind of iterator? I think we can only provide the stack when this was an unhandled managed exception and only as much as we can get from new StackTrace (Exception e)
.
At one extreme we could capture a snapshot of the stack and build entire infrastructure similar to StackTrace/StackFrame
on top of that (modulo System.Reflection.MethodBase GetMethod()
).
On the other hand we can say that what StackTrace.ToString
produces is parseable enough and give that out as one string.
if hostfxr as an example, the strings should be const pal::char_t*
Yeah, char16_t
vs char_t
is pain to deal with. char_t
is better for portability, but it may require conversions. Do you have an opinion?
The most contentious is argExceptionString: is it worth including?
I think it depends on the handler. Some handlers may want to get more detailed information (even if it means that secondary crashes can occur while gathering this extra information), some may want to get minimal information, some may want to get more detailed information only for certain types of crashes. It makes me think that we may want to have some sort of callback-based pattern to optionally gather the more detailed information. If we do that, we can start with something very minimal and evolve it based on feedback.
Yeah, char16_t vs char_t is pain to deal with. char_t is better for portability, but it may require conversions. Do you have an opinion?
I'd prefer char_t
. For portability reasons and similarity with hosting APIs.
Unlike old APIs influenced by COM and Win32, hostfxr is relatively new and some thinking about portability must have been done. I am not aware of const char_t*
in its APIs causing any friction.
Crash handler is not exactly a hosting component, but in a way it is related to app's life cycle. - Another reason I looked at hosting for API shape examples.
I'd prefer char_t. For portability reasons and similarity with hosting APIs.
I don't think this is a good idea. The hosting APIs have very different UX and rules. The hosting APIs are expecting strings that are created by the user and provided. This API is entirely in the domain of runtime provided strings, so we should not introduce any additional munging in that space and provide what we have. char16_t
is far more preferable to yet another set of PAL definitions that users will need to manage.
I don't think this is a good idea. The hosting APIs have very different UX and rules. The hosting APIs are expecting strings that are created by the user and provided. This API is entirely in the domain of runtime provided strings, so we should not introduce any additional munging in that space and provide what we have. char16_t is far more preferable to yet another set of PAL definitions that users will need to manage.
Makes sense. I did notice that unlike hosting, the strings go in a different direction (from runtime to the handler).
By char16_t
- do you mean on both Windows and Unix? Any thoughts about encoding?
By char16_t - do you mean on both Windows and Unix? Any thoughts about encoding?
My preference would be keep them in UTF-16. If we wanted to marshal them to some canonical form on a per OS basis, then I think we should go with our previous conversation and use void*
and document what users should cast the pointer to. My main quibble here would be defining any PAL sounding types/APIs for such a narrow and low-level feature.
The most contentious is argExceptionString: is it worth including?
I think it depends on the handler. Some handlers may want to get more detailed information (even if it means that secondary crashes can occur while gathering this extra information), some may want to get minimal information, some may want to get more detailed information only for certain types of crashes. It makes me think that we may want to have some sort of callback-based pattern to optionally gather the more detailed information. If we do that, we can start with something very minimal and evolve it based on feedback.
I think the managed stack trace is important and hard to get in other ways, we should add it, and should add it separately from argExceptionString
even if it would often overlap, because argExceptionString
comes from ToString()
and could be truncated or overriden to not have stack in it.
I would produce the stacktrace in just one string though. Any kind of iterator or object model on top of that could be overthinking. It would still be in some textual form, just chunked up. So maybe more structural stack info could come in the future improvements, if there is a need.
I think the managed stacktrace needs to be optional and only computed if the handler wants it given that computing it is a very complex operation. I do not have a strong opinion about how it should be formatted.
hard to get in other ways
This may change as we make progress on cDAC.
I think the managed stacktrace needs to be optional and only computed if the handler wants it given that computing it is a very complex operation. I do not have a strong opinion about how it should be formatted.
Would that apply to ex.ToString()
as well?
That is what we currently do in QCALLTYPE Environment_FailFast
where we still can run managed code. After that it comes to the EEPolicy::HandleFatalError
in a form of a string for which we've already paid.
If the fail-fast is originating as Environment.FailFast, I do not have concerns about collecting the stacktrace upfront like we do that today.
I have concerns about computing the stacktrace automatically for failfasts that are e.g. crashes in the middle of GC. We do print the managed stacktrace to console for these today, but it is not clear whether it is a good idea. We have seen situations where it caused secondary crashes and made the original crash undiagnosable.
Right. A managed stack is safe to get if it was a managed unhandled exception, not a random corruption. And even that may need to exclude cases like OOM.
Since Environment_FailFast
covers the "safe" unhandled managed case anyways, if we get argExceptionString
from it, we will pass it to the user, but otherwise can avoid any additional complex calls.
The latest iteration that should capture all the suggestions so far. Let me know if I missed/misunderstood something or there are some additional thoughts.
Managed API to set up the handler
public static class ExceptionHandling
{
/// <summary>
/// .NET runtime is going to call `fatalErrorHandler` set by this method before its own
/// fatal error handling (creating .NET runtime-specific crash dump, etc.).
/// </summary>
/// <exception cref="ArgumentNullException">If fatalErrorHandler is null</exception>
/// <exception cref="InvalidOperationException">If a handler is already set</exception>
public static void SetFatalErrorHandler(delegate* unmanaged<uint, void*, int> fatalErrorHandler);
}
The shape of the FatalErrorHandler, if implemented in c++ (the default calling convention for the given platform is used)
// expected signature of the handler
FatalErrorHandlerResult FatalErrorHandler(int32_t hresult, struct FatalErrorInfo* data);
With FatalErrorHandlerResult
and FatalErrorInfo
defined in "TBD.h" as:
enum FatalErrorHandlerResult : int32_t
{
RunDefaultHandler = 0,
SkipDefaultHandler = 1,
};
struct FatalErrorInfo
{
size_t size; // size of the FatalErrorInfo instance
void* address; // code location correlated with the failure (i.e. location where FailFast was called)
// exception/signal information, if available
void* info; // Cast to PEXCEPTION_RECORD on Windows or siginfo_t* on non-Windows.
void* context; // Cast to PCONTEXT on Windows or ucontext_t* on non-Windows.
// UTF-16 strings with additional information as appropriate
char16_t* errorSource; // example: "Assertion Failed"
char16_t* message; // error message. For example the one passed to "FailFast(message)"
char16_t* detailMessage; // detailMessage that was passed to "Assert(condition, message, detailMessage)"
char16_t* exceptionString; // for unhandled managed exceptions the result of "ex.ToString()"
};
Setting up the handler for the app (C# code in the actual app):
internal unsafe class Program
{
[DllImport("myCustomCrashHandler.dll")]
public static extern delegate* unmanaged<uint, void*, int> GetFatalErrorHandler();
static void Main(string[] args)
{
ExceptionHandling.SetFatalErrorHandler(GetFatalErrorHandler());
RunMyProgram();
}
}
The handler. (c++ in myCustomCrashHandler.dll
)
#include "TBD.h"
static FatalErrorHandlerResult FatalErrorHandler(int32_t hresult, struct FatalErrorInfo* data)
{
// this is a special handler that analyzes OOM crashes
if (hresult != COR_E_OUTOFMEMORY)
{
return FatalErrorHandlerResult::RunDefaultHandler;
}
DoSomeCustomProcessingOfOOM(data);
// no need for a huge crash dump after an OOM.
return FatalErrorHandlerResult::SkipDefaultHandler;
}
extern "C" DLL_EXPORT void* GetFatalErrorHandler()
{
return &FatalErrorHandler;
}
[UnmanagedCallersOnly]
[DllImport("myCustomCrashHandler.dll")]
I do not think we allow a single method to be both UnmanagedCallersOnly and DllImport. I should be more like this:
[DllImport("myCustomCrashHandler.dll")]
public static extern delegate* unmanaged<...> GetFatalErrorHandler();
// returns false if default handler is not desired.
I think this can use clarification. One possible behavior:
Is it what you meant?
We may want to make it an enum to allow extending the possible actions in future. It would also avoid the need for x-plat definition of BOOL
.
__cdecl
The explicit __cdecl is not in the managed signature. I would avoid the explicit calling convention, and just say that it is the platform default.
struct ExceptionData
Nit: This should be FatalErrorInfo.
Yes to all the above. I will update the prototype in place.
I do not think we allow a single method to be both UnmanagedCallersOnly and DllImport.
When I try that without UnmanagedCallersOnly, I get a compile error when taking an address. (&
)
Error CS8786 Calling convention of 'Program.FatalErrorHandler(uint, void*)' is not compatible with 'Unmanaged'.
With the attribute it compiles. At least C# thinks it is needed for this scenario. (I did not check what it emits)
Try to run:
using System.Runtime.InteropServices;
unsafe
{
delegate* unmanaged<uint, void*, int> p = &FatalErrorHandler;
p(0, null);
[UnmanagedCallersOnly]
[DllImport("myCustomCrashHandler.dll")]
static extern int FatalErrorHandler(uint exitCode, void* fatalErrorHandler);
}
You will get: "Unhandled exception. System.NotSupportedException: Method's type signature is not PInvoke compatible."
You will get: "Unhandled exception. System.NotSupportedException: Method's type signature is not PInvoke compatible."
I saw that, but thought it is at the call site failure - since the caller is not unmanaged. That is just an example of use though. We can come up with something else. There must be a way to fetch a pointer to a native function from managed code. Perhaps more than one.
Although I kind of wonder if we need to encode the native method signature in the API, or whether it is a function at all.
public static void SetFatalErrorHandler(delegate* unmanaged<uint, void*, int> fatalErrorHandler);
vs.
public static void SetFatalErrorHandler(void* fatalErrorHandler);
I am thinking back and forth about this.
We can assign a specific delegate type to the handler, but not sure if it brings a lot of benefits. There is void*
in the parameters anyways, so it is already weakly typed.
There must be a way to fetch a pointer to a native function from managed code. Perhaps more than one.
NativeLibrary.Load
+ NativeLibrary.GetExport
. It does not work with static linkingAlthough I kind of wonder if we need to encode the native method signature in the API, or whether it is a function at all.
https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.objectivec.objectivecmarshal.initialize?view=net-8.0 is recently added API with similar shape. It has the function pointers in the signature.
I have applied all the suggestions. I think this is ready to be put in one proposal (together with managed handler part) and send to API review.
LGTM
Nit: The example has GetFatalErrorHandler
on the managed side, but it does not show it on the unmanaged side.
Nit: The example has GetFatalErrorHandler on the managed side, but it does not show it on the unmanaged side.
And with GetFatalErrorHandler
pattern the handler itself does not need to be DLL_EXPORT
I think we want to put FatalErrorHandlerResult
and FatalErrorInfo
in a header file. Any existing files this that can go into?
I would create a new small header file for this. It can live in https://github.com/dotnet/runtime/tree/main/src/native/public. I would not worry about shipping it - just tell people to copy it over from the repo if they need it.
(We have header files for debugger, profiler or hosting. Each of these are handled differently, there is not much consistency.)
The combined proposal is at https://github.com/dotnet/runtime/issues/101560
@jkotas @AaronRobinsonMSFT - huge thanks for helping with this!!
Superseded by https://github.com/dotnet/runtime/issues/101560
Background and Motivation
Scenarios like designers or REPLs that host user provided code are not able to handle unhandled exceptions thrown by the user provided code. Unhandled exceptions on finalizer thread, threadpool threads or user created threads will take down the whole process. This is not desirable experience for these type of scenarios.
The discussion that lead to this proposal is in https://github.com/dotnet/runtime/issues/39587
Proposed API
Allow ignoring unhandled exceptions on threads created by the runtime from managed UnhandledException handler:
Usage Examples
Alternative Designs
Unmanaged hosting API that enables this behavior. (CoreCLR has poorly documented and poorly tested configuration option for this today.)
Similar prior art:
UnobservedTaskExceptionEventArgs.Observed
+UnobservedTaskExceptionEventArgs.SetObserved
CancelEventArgs.Cancel
Risks
This API can be abused to ignore unhandled exceptions in scenarios where it is not warranted.