Open VSadov opened 6 months ago
FatalErrorHandlerResult FatalErrorHandler(int32_t hresult, struct FatalErrorInfo* data);
Defines int32_t
, but the C# function pointer in SetFatalErrorHandler()
defined uint
. I assume both should be signed.
I wonder if having the messages etc. as 16 bit strings is the right way. Our coreclr_xxx hosting APIs use 8 bit (UTF-8) strings so that users on Unix don't have to convert them using some external library. 16 bit strings are very unusual on Unix and on Windows, it is trivial to do the conversion if needed thanks to the Windows APIs.
We had to choose between char_t
and char16_t
. See: https://github.com/dotnet/runtime/issues/42275#issuecomment-2073114120
The key observations were -
In case the native handler on Unix would want to do anything sensible with these strings, even just writing them to a text log file, I think it would most likely need to convert them. In the runtime, we can easily convert them into stack allocated buffers without disturbing the process state. I still think it would be better if whatever string enters or leaves the runtime from the native side was 8 bit. Similar to the hosting APIs, it would add a little overhead for the handler on Windows, but much less than the overhead that would otherwise be needed on Unix. I am not sure I understand the comment in the other issue on a need to have some PAL definitions if we went with 8 bit strings.
I am not sure I understand the comment in the other issue on a need to have some PAL definitions if we went with 8 bit strings.
The approach in question was to have a char_t
that represented wchar_t
on Windows and char
on non-Windows. If we wanted to make it always UTF-8, I could get behind that. That seems rather annoying, but I do get it. My desire is to avoid any sort of PAL types that users would need to think about. Casting doesn't concern me, but encoding types in signatures or data structures would be a red flag to me.
How complex is UTF-16 to UTF-8 conversion? Is there IO or allocations?
How complex is UTF-16 to UTF-8 conversion? Is there IO or allocations?
There will be allocations for sure. The IO is nil or very limited if done from managed code if I recall. The native side trans-coding is also new and I believe avoids IO. It isn't the worst idea if it is just for logging. The tricky bit is if the callback comes back into managed though - that is my biggest concern.
How complex is UTF-16 to UTF-8 conversion? Is there IO or allocations?
It can be ~100 - ~1000 lines depending on how optimized implementation you want. The problem is that there is no broadly available standardized method to do this conversion on non-Windows platforms, so everybody using this API on non-Windows would have to copy the conversion routine from somewhere.
As a (usability) test for this API, you can write a sample handler that produces .json that captures the failure details (similar to .json that is produced by native AOT crash handler today). You will hit the UTF16 conversion problem pretty much immediately.
The tricky bit is if the callback comes back into managed though - that is my biggest concern.
Calling managed implementation is not an option for the crash handlers.
There will be allocations for sure. The IO is nil or very limited if done from managed code if I recall. The native side trans-coding is also new and I believe avoids IO. It isn't the worst idea if it is just for logging. The tricky bit is if the callback comes back into managed though - that is my biggest concern.
If a string comes from native assert (like from GC), running managed code might not be an option. I assumed the conversion can be done completely in native code.
The problem is that there is no broadly available standardized method to do this conversion on non-Windows platforms
Right, they will likely need to use ICU or some other library.
The tricky bit is if the callback comes back into managed though - that is my biggest concern.
Calling managed implementation is not an option for the crash handlers.
Excellent
You will hit the UTF16 conversion problem pretty much immediately.
Fair. This implies UTF-8 should be the preference then.
It can be ~100 - ~1000 lines depending on how optimized implementation you want. The problem is that there is no broadly available standardized method to do this conversion on non-Windows platforms, so everybody using this API on non-Windows would have to copy the conversion routine from somewhere.
That is not too bad. Since this is a crash handling and strings are not expected to be huge, I'd optimize for reliability. (avoiding IO, allocations, syscalls, if possible - the process might have crashed, but who knows what locks are still held and such...)
Right, they will likely need to use ICU or some other library.
ICU is big, complicated library. It is not something you want to depend on in a crash handler. I think that the most appropriate solution would be to include source for the simplest possible conversion from somewhere.
Is it possible to get content of char16_t* exceptionString; // for unhandled managed exceptions the result of "ex.ToString()"
by request? I believe that when FatalErrorHandler
is called stack is still available.
I suggest to do it on demand because it will create issues for COR_E_OUTOFMEMORY
and also we had an issue that @jkotas helped us with COM when calling Exception.ToString considerably slowdown future COM calls. See #98788.
If conversion to UTF-8 is going to be needed in majority of scenarios (especially on Unix), I think we should do UTF-8 rather than expect that the end user will convert.
I'll update the proposal to have char8_t
.
Is it possible to get content of char16_t* exceptionString; // for unhandled managed exceptions the result of "ex.ToString()" by request? I believe that when FatalErrorHandler is called stack is still available.
ex.ToString()
typically happens at the time when runtime finds that the exception is unhandled. By the time when the native crash handler runs, managed code would not be able to run. To have any chance to report that string it should be captured and stored earlier, while that is still possible. I do not think the string can be produced on request while handling a crash.
On the other hand, a fatal error can happen only once in the life of an app, so perhaps performance of this API will not be as critical as in scenarios that can repeat.
@vladimir-cheverdyuk-altium has a good point about having an option to suppress computation of the textual stacktrace or doing other non-trivial operations if the crash handler is not interested in the information. Such non-trivial operations can hit secondary crashes or otherwise interfere with the diagnosability of the original unhandled exception.
I had similar concern in the original discussion.
they will likely need to use ICU or some other library.
Runtime's implementation (which is pretty much standalone) could also be an option for users.
can be built with sources acquired from tag link:
$ curl --create-dirs --output-dir external/minipal -sSLO \
"https://raw.githubusercontent.com/dotnet/runtime/v8.0.4/src/native/minipal/{utf8.c,utf8.h,utils.h}"
$ clang -Iexternal test.c external/minipal/utf8.c -o testutf8
$ ./testutf8
Conversion completed. mblen: 33, mbstr: ハローワールド! 👋 🌏
@vladimir-cheverdyuk-altium has a good point about having an option to suppress computation of the textual stacktrace or doing other non-trivial operations if the crash handler is not interested in the information. Such non-trivial operations can hit secondary crashes or otherwise interfere with the diagnosability of the original unhandled exception.
At the time of reporting the crash it would be too late to decide whether FailFast should have captured the exception message for the unhandled exception or not.
We could add a bool parameter to SetFatalErrorHandler
that would have effect of suppressing that ex.ToString
process-wide.
The result of that would be that the default crash processing will not have the string either - i.e. will not print it to console.
The benefits of that would be slightly faster handling of the crash and more robust behavior in case ex.ToString
causes secondary errors.
Cases like OOM or stack overflow are always tricky. Also ex.ToString
can do anything - like throw this;
.
In most cases unhandled exceptions are just that - unhandled exceptions (null dereferences, file did not exist, ..) and getting exception message would be safe and useful.
Basically - we can add a knob to SetFatalErrorHandler
to suppress ex.ToString
when installing the handler, but I wonder is there will be a lot of users for that.
Is it really that often that ex.ToString
can do really bad things (like stack overflow) and simply placing it inside try/catch
will not make the scenario safe enough, that you'd rather prefer to not see stack traces?
@VSadov
At the time of reporting the crash it would be too late to decide whether FailFast should have captured the exception message for the unhandled exception or not. I'm not sure about FailFast but for regular exceptions I think exception stack should be available unless I'm missing something.
As it was stated in https://github.com/dotnet/runtime/issues/98788, ex.ToString will make COM calls slow if it was in the call stack and it is why I'm concerned.
Perhaps I do not understand the scenario...
When a managed exception is unhandled(*), the runtime will call Environment_FailFast
and will pass the exception as one of the parameters.
https://github.com/dotnet/runtime/blob/6cd329b25315c5f21e86dac74f6cbe6d0deb6b0e/src/coreclr/classlibnative/bcltype/system.cpp#L168
Environment_FailFast
will make a call to exception.ToString()
. (via GetExceptionMessage
). At that point we still can run managed code, but not for long.
A few lines later Environment_FailFast
will call EEPolicy::HandleFatalError
, which will proceed to things like capturing a dump and exiting the process.
I understand the reliability concern about exception.ToString()
doing something horrible, but not sure where the performance concern comes from. What will observe slowness in COM calls after Environment_FailFast
?
(*) Unhandled here means - "really unhandled". If there was UnhandledExceptionHandler
and handled it, the exception is not unhandled after that and will not FailFast/crash.
I am not worried about performance. I am worried about providing the right information to the crash handler, without introducing unnecessary failure points between the failure point and the call of the crash handler.
On one end, you can have crash handlers that just want to capture crashdump for offline processing and do nothing else. Computing the textual stacktrace is source of unnecessary failure points for those.
On the other end, you can have crash handlers that want to capture many details like stacktrace with file and line numbers if possible.
The design should accommodate the whole spectrum. (We do not need to implement everything in the first round, but we should have an idea for how we would go about covering the whole spectrum.)
@VSadov My idea was to pass exception handle (it can be exception object itself) to handler and later handler can call function to get more information from this handle.
What will observe slowness in COM calls after Environment_FailFast?
COM calls slows down after ex.ToString called if some module is on the call stack. Some code that walks stack converts module to read/write mode from read-only mode and as result search became linear instead of binary.
The design should accommodate the whole spectrum. (We do not need to implement everything in the first round, but we should have an idea for how we would go about covering the whole spectrum.)
I think the suggested API shape is sufficient for the first round. Then we can see how the API will be used and if/how it can be made better.
If we find that we really need to allow configuring ex.ToString()
in FailFast, we can add an overload of SetFatalErrorHandler
that takes an extra BOOL
for that. Or we can pass an enum
that can suppress any combination of optional parts in FatalErrorInfo
.
Basically - I think there is a way to react to the need to configure, but maybe there is no need. In such case having a simpler API would be better.
Or we can pass an enum that can suppress any combination of optional parts in FatalErrorInfo.
It assumes that the handler wants the fixed information for all types of crashes. It is not necessarily the case.
I am coming back to the callback idea: By default, the crash handler should only get information that does not require any extra memory allocations or substantial effort to compute. Everything else should be provided via callbacks. I think it is the most future proof and flexible design.
Callbacks are more flexible, but may require that there is some state that can be interrogated. Right now we can store all the relevant info that we have to the stack and then invoke the handler while passing the pointers.
If the handler can come back for more info, we would need to store that info ahead of time somewhere (likely not on stack). We could defer some things like converting to UTF-8, but assuming that is relatively trivial, it may be not something that matters to defer.
Alternatively, we could provide the extra info by digging through the memory of the dead process - like debugger does. Maybe that would be better served by just exposing debugging APIs? An additional advantage for that would be that it might also work over a core dump, provided that debug info is available. This can certainly be the future direction where all this will be going.
As for configuring the behavior of ex.ToString
via a callback I could see a scenario when FailFast could make a callback (managed or native?) and ask if ToString
is desired.
This question would be asked only once per lifetime of the process, so there is not a lot of advantages over pre-configuring that via a bool parameter when setting up a handler. If concurrent FailFast is possible, the callback may fire more than once and a different answer could be provided, but I am not sure this is an advantage or not.
As for configuring the behavior of ex.ToString via a callback I could see a scenario when FailFast could make a callback (managed or native?) and ask if ToString is desired.
Maybe that callback can return flags that will specify what optional fields of FatalErrorInfo it should populate?
Can this proposal be updated to use a EventArgs
that contains a GetDeferral()
method to allow async handler?
public delegate void UnhandledExceptionHandler(UnhandledExceptionHandlerEventArgs e);
public sealed class UnhandledExceptionHandlerEventArgs
{
private TaskCompletionSource? m_tcs;
private sealed class UnhandledExceptionHandlerDeferral(TaskCompletionSource tcs) : IDisposable
{
public void Dispose() => tcs.TrySetResult();
}
public Exception Exception { get; }
public IDisposable GetDeferral() => new UnhandledExceptionHandlerDeferral(m_tcs ??= new());
public bool Handled { get; set; }
}
Usage 1: synchronized handler
void MyHandler(UnhandledExceptionHandlerEventArgs e)
{
e.Handled = true;
}
Usage 2: asynchronized handler
async void MyHandler(UnhandledExceptionHandlerEventArgs e)
{
using var deferral = e.GetDeferral();
await MyAsyncHandlerMethod();
e.Handled = true;
}
There is nothing else that the thread dealing with the unhandled exception can do until the unhandled exception was dealt with, so there is no advantage in complicating this API with async. If you handler needs to call async methods, it should use Task.Wait.
I've made changes to the fatal error handling API that allow querying for the error text if needed.
If the custom logger prints the provided text to console, it will achieve the same behavior as in the default handler. It could also add the text to a file, append to some binary storage, etc..
The same pattern may be followed in the future to allow querying for some other data, not necessarily textual.
Nit: pfnLogAction
will be called multiple times by the runtime. The runtime is not required to materialize the entire message in one long string.
Yes, that may be desirable. I’ll modify the description to allow for sending that string in pieces.
Updated for possibility of multiple calls to the pfnLogAction
void (*pfnGetFatalErrorLog)(
FatalErrorInfo* errorData,
void (*pfnLogAction)(void *userContext, char8_t* logString),
void* userContext);
The above signature needs explicit calling conventions. I assume we want __stdcall
on Windows x86 and the "default" everywhere else.
void (*pfnLogAction)(void *userContext, char8_t* logString),
The userContext
variable is typically passed in last to the callback.
The above signature needs explicit calling conventions. I assume we want __stdcall on Windows x86 and the "default" everywhere else.
How would it look in the declaration? Would that require #ifdef
?
void (*pfnLogAction)(void *userContext, char8_t* logString),
The
userContext
variable is typically passed in last to the callback.
Could that make it awkward when more parameters need to be added? Maybe if adding an overload in v-next of this API or in an another entry point that logs something else. Would the other variants still have the context at the end?
The above signature needs explicit calling conventions. I assume we want __stdcall on Windows x86 and the "default" everywhere else.
How would it look in the declaration? Would that require
#ifdef
?
Yep. Something like the following:
#if defined(_MSC_VER) && defined(_M_IX86)
#define DOTNET_CALLBACK __stdcall
#else
#define DOTNET_CALLBACK
#endif
Could that make it awkward when more parameters need to be added? Maybe if adding an overload in v-next of this API or in an another entry point that logs something else. Would the other variants still have the context at the end?
What other information are you envisioning that could evolve here? Adding additional arguments to a callback should be avoided. The best solution would be to create a new callback option rather than changing an existing callback API. The "context" is almost always last. I can't even think of a counter example, so this seems like "the pattern" to follow.
Yep. Something like the following:
Would it be sufficient to just document it as:
"In all signatures of this API the default calling convention for the given platform is used."
I am not against the #ifdef
just wonder if simply adding a comment is just as good.
What other information are you envisioning that could evolve here? Adding additional arguments to a callback should be avoided. The best solution would be to create a new callback option rather than changing an existing callback API. The "context" is almost always last. I can't even think of a counter example, so this seems like "the pattern" to follow.
I could be easily convinced that the callback will be at the end. As the only truly required parameter in any kind of such pattern it would seem odd to put it at the end, but if that is very common, we can do that.
"In all signatures of this API the default calling convention for the given platform is used."
That is possible, but that is ambigous on Windows. For example, the Windows OS is generally __stdcall
, but VS defaults to __cdecl
. Which means from a user perspective it might not even be something they have thought about in their project. My preference here is making the code as copy/paste friendly as possible.
I've updated the signatures to add the #ifdef
/__stdcall thing and made the context the last.
Do we have enough of the API shape to move this to "ready for review"?
I am ready to have a broader conversation. @jkotas?
Same here.
public static void SetUnhandledExceptionHandler(UnhandledExceptionHandler handler);
The naming here seems confusing to me, this is named very similarly to the AppDomain.UnhandledException event, and in particular it doesn't seem clear that this won't be called for all unhandled exceptions, only unhandled exceptions that can actually be ignored.
It's not a big difference, but what about SetUnhandledExceptionFilter
? That matches the imaginary code:
try { UserCode(); } catch (Exception ex) when handler(ex){};
in that handler
is an exception filter.
IMHO it's also somewhat more obvious that this won't be called for all unhandled exceptions, only those that can actually be filtered/ignored.
A few other ideas, I don't like any of them, but maybe somebody else are inspired to come up with better names:
SetFilterableUnhandledExceptionFilter
SetCatchUnhandledExceptionHandler
Additionally, it would be nice to be able to invoke the callback.
Example use case for .NET for iOS: we have an API called InvokeOnMainThread (Action action)
, where developers can do this:
InvokeOnMainThread (() => { ShowMessageBox ("Hello World!"); });
which would currently, if ShowMessageBox
threw an exception, end up bringing the process down. If we could implement this internally like this:
public void InvokeOnMainThread (Action action)
{
NativeInvokeOnMainThread (() => {
// This is effectively called from a reverse P/Invoke
try {
action ():
} catch (Exception ex) when ExceptionHandling.FilterUnhandledException (ex) {
// ignored
}
});
}
that would be nice.
Lastly:
/// <exception cref="ArgumentNullException">If handler is null</exception> /// <exception cref="InvalidOperationException">If a handler is already set</exception> public static void SetUnhandledExceptionHandler(UnhandledExceptionHandler handler);
There's no way to clear/change/reset the handler once it's set? Is that intentional?
Does this proposal address crashes in unmanaged code? The main motivation in https://github.com/dotnet/runtime/issues/79706 is the ability for a user app (which is commonly a combination of managed code and native libraries) to install a third-party crash reporter behind the .NET runtime so that all fatal app terminations are captured and logged. This proposal seems to only deal with unhandled managed exceptions - or will SetFatalErrorHandler
also install something for those cases? The proposal is fairly vague about what kinds of fatal errors will trigger the handler
The naming here seems confusing to me, this is named very similarly to the AppDomain.UnhandledException event, and in particular it doesn't seem clear that this won't be called for all unhandled exceptions
Good point. I agree that the name variants like you have suggested can be used to make the behavior more clear.
Another option is to call it for all unhandled exceptions and change the filter to take an extra bool ignorable
argument that will say whether the exception can be ignored.
Additionally, it would be nice to be able to invoke the callback.
It sound reasonable to me.
There's no way to clear/change/reset the handler once it's set? Is that intentional?
Yes, it is intentional. These callbacks are meant to be used at the applications level. The application is expected to be have one and only unhandled exception and fatal error handling policy.
Does this proposal address crashes in unmanaged code?
SetFatalErrorHandler
is expected to address crashes in unmanaged code. Thinking about this more, there are two cases: fatal crashes in the .NET code (including unmanaged runtime code) and crashes in other code.
We clearly want this to intercept handle crashes in .NET code, but handling crashes in other code is less clear cut. It makes sense if the app wants to own the process, but it does not make sense if the .NET code is a library (e.g. think native AOT library) and the main app has its own crash handler. Are we going to limit this to handling crashes in .NET code or should we have a flag that says whether the handler should intercept all crashes - both in .NET code and other code?
Why a dedicated delegate type and not Func<Exception, bool>
?
Func<Exception, bool>
fatalErrorHandler
to handler
for consistency.namespace System.Runtime.ExceptionServices
{
public static class ExceptionHandling
{
/// <summary>
/// Sets a handler for unhandled exceptions.
/// </summary>
/// <exception cref="ArgumentNullException">If handler is null</exception>
/// <exception cref="InvalidOperationException">If a handler is already set</exception>
public static void SetUnhandledExceptionHandler(Func<Exception, bool> handler);
/// <summary>
/// .NET runtime is going to call `fatalErrorHandler` set by this method before its own
/// fatal error handling (creating .NET runtime-specific crash dump, etc.).
/// </summary>
/// <exception cref="ArgumentNullException">If fatalErrorHandler is null</exception>
/// <exception cref="InvalidOperationException">If a handler is already set</exception>
public static unsafe void SetFatalErrorHandler(delegate* unmanaged<int, void*, int> handler);
}
}
Re: The previous proposal and a discussion that led to this proposal - (https://github.com/dotnet/runtime/issues/42275)
Background and motivation
The current default behavior in a case of unhandled exception is termination of a process. The current default behavior in a case of a fatal error is print exception to console and invoke Watson/CrashDump.
While satisfactory to the majority of uses, the scheme is not flexible enough for some classes of scenarios.
Scenarios like Designers, REPLs or game scripting that host user provided code are not able to handle unhandled exceptions thrown by the user provided code. Unhandled exceptions on finalizer thread, threadpool threads or user created threads will take down the whole process. This is not desirable experience for these types of scenarios.
In addition, there are customers that have existing infrastructure for postmortem analysis of failures and inclusion of .NET components requires interfacing with or overriding the way the fatal errors are handled.
API Proposal
API for process-wide handling of unhandled exception
The semantics of unhandled exception handler follows the model of imaginary handler like the following inserted in places where the exception will not lead to process termination regardless of what
handler()
returns.In particular:
false
.main()
will not install the try/catch like aboveAPI Proposal for custom handling of fatal errors
Managed API to set up the handler.
The shape of the FatalErrorHandler, if implemented in c++ (the default calling convention for the given platform is used)
With
FatalErrorHandlerResult
andFatalErrorInfo
defined in "FatalErrorHandling.h" under src/native/public:API Usage
Setting up a handler for unhandled exceptions:
Setting up a handler for fatal errors:
Setting up the handler for the process (C# code in the actual app):
The handler. (c++ in
myCustomCrashHandler.dll
)Alternative Designs
Unmanaged hosting API that enables this behavior. (CoreCLR has undocumented and poorly tested configuration option for this today. #39587. This option is going to be replaced by this API.)
Extending
AppDomain.CurrentDomain.UnhandledException
API and makeIsTerminating
property writeable to allow "handling". Upon scanning the existing use of this API it was found thatIsTerminating
is often used as a fact - whether an exception is terminal or not. Changing the behavior to mean "configurable" will be a breaking change to those uses.Risks
This APIs can be abused to ignore unhandled exceptions or fatal errors in scenarios where it is not warranted.