dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.25k stars 4.73k forks source link

[API Proposal]: Overriding the default behavior in case of unhandled exceptions and fatal errors. #101560

Open VSadov opened 6 months ago

VSadov commented 6 months ago

Re: The previous proposal and a discussion that led to this proposal - (https://github.com/dotnet/runtime/issues/42275)

Background and motivation

The current default behavior in a case of unhandled exception is termination of a process. The current default behavior in a case of a fatal error is print exception to console and invoke Watson/CrashDump.

While satisfactory to the majority of uses, the scheme is not flexible enough for some classes of scenarios.

Scenarios like Designers, REPLs or game scripting that host user provided code are not able to handle unhandled exceptions thrown by the user provided code. Unhandled exceptions on finalizer thread, threadpool threads or user created threads will take down the whole process. This is not desirable experience for these types of scenarios.

In addition, there are customers that have existing infrastructure for postmortem analysis of failures and inclusion of .NET components requires interfacing with or overriding the way the fatal errors are handled.

API Proposal

API for process-wide handling of unhandled exception

namespace System.Runtime.ExceptionServices
{
    public delegate bool UnhandledExceptionHandler(System.Exception exception);

    public static class ExceptionHandling
    {
        /// <summary>
        /// Sets a handler for unhandled exceptions.
        /// </summary>
        /// <exception cref="ArgumentNullException">If handler is null</exception>
        /// <exception cref="InvalidOperationException">If a handler is already set</exception>
        public static void SetUnhandledExceptionHandler(UnhandledExceptionHandler handler);
    }
}

The semantics of unhandled exception handler follows the model of imaginary handler like the following inserted in places where the exception will not lead to process termination regardless of what handler() returns.

try { UserCode(); } catch (Exception ex) when handler(ex){};

In particular:

API Proposal for custom handling of fatal errors

Managed API to set up the handler.

namespace System.Runtime.ExceptionServices
{
    public static class ExceptionHandling
    {
        /// <summary>
        /// .NET runtime is going to call `fatalErrorHandler` set by this method before its own
        /// fatal error handling (creating .NET runtime-specific crash dump, etc.).
        /// </summary>
        /// <exception cref="ArgumentNullException">If fatalErrorHandler is null</exception>
        /// <exception cref="InvalidOperationException">If a handler is already set</exception>
        public static void SetFatalErrorHandler(delegate* unmanaged<int, void*, int> fatalErrorHandler);
    }
}

The shape of the FatalErrorHandler, if implemented in c++ (the default calling convention for the given platform is used)

// expected signature of the handler
FatalErrorHandlerResult FatalErrorHandler(int32_t hresult, struct FatalErrorInfo* data);

With FatalErrorHandlerResult and FatalErrorInfo defined in "FatalErrorHandling.h" under src/native/public:

enum FatalErrorHandlerResult : int32_t
{
    RunDefaultHandler = 0,
    SkipDefaultHandler = 1,
};

#if defined(_MSC_VER) && defined(_M_IX86)
#define DOTNET_CALLCONV __stdcall
#else
#define DOTNET_CALLCONV
#endif

struct FatalErrorInfo
{
    size_t size;    // size of the FatalErrorInfo instance
    void*  address; // code location correlated with the failure (i.e. location where FailFast was called)

    // exception/signal information, if available
    void* info;     // Cast to PEXCEPTION_RECORD on Windows or siginfo_t* on non-Windows.
    void* context;  // Cast to PCONTEXT on Windows or ucontext_t* on non-Windows.

    // An entry point for logging additional information about the crash.
    // As runtime finds information suitable for logging, it will invoke pfnLogAction and pass the information in logString.
    // The callback may be called multiple times.
    // Combined, the logString will contain the same parts as in the console output of the default crash handler.
    // The errorLog string will have UTF-8 encoding.
    void (DOTNET_CALLCONV *pfnGetFatalErrorLog)(
           FatalErrorInfo* errorData, 
           void (DOTNET_CALLCONV *pfnLogAction)(char8_t* logString, void *userContext), 
           void* userContext);

    // More information can be exposed for querying in the future by adding
    // entry points with similar pattern as in pfnGetFatalErrorLog
};

API Usage

Setting up a handler for unhandled exceptions:

using System.Runtime.ExceptionServices;

ExceptionHandling.SetUnhandledExceptionHandler(
    (ex) =>
    {
        if (DesignMode)
        {
            DisplayException(ex);
            // the exception is now "handled"
            return true;
        }
        return false;
    }
);

Setting up a handler for fatal errors:

Setting up the handler for the process (C# code in the actual app):

internal unsafe class Program
{
    [DllImport("myCustomCrashHandler.dll")]
    public static extern delegate* unmanaged<int, void*, int> GetFatalErrorHandler();

    static void Main(string[] args)
    {
        ExceptionHandling.SetFatalErrorHandler(GetFatalErrorHandler());

        RunMyProgram();
    }
}

The handler. (c++ in myCustomCrashHandler.dll)

#include "FatalErrorHandling.h"

static FatalErrorHandlerResult FatalErrorHandler(int32_t hresult, struct FatalErrorInfo* data)
{
    // this is a special handler that analyzes OOM crashes
    if (hresult != COR_E_OUTOFMEMORY)
    {
        return FatalErrorHandlerResult::RunDefaultHandler;
    }

    DoSomeCustomProcessingOfOOM(data);

    // retain the additional error data
    data->pfnGetFatalErrorLog(data, &LogErrorMessage, NULL);

    // no need for a huge crash dump after an OOM.
    return FatalErrorHandlerResult::SkipDefaultHandler;
}

static void LogErrorMessage(char8_t* logString, void *userContext)
{
    AppendToBlob(logString);
}

extern "C" DLL_EXPORT void* GetFatalErrorHandler()
{
    return &FatalErrorHandler;
}

Alternative Designs

Unmanaged hosting API that enables this behavior. (CoreCLR has undocumented and poorly tested configuration option for this today. #39587. This option is going to be replaced by this API.)

Extending AppDomain.CurrentDomain.UnhandledException API and make IsTerminating property writeable to allow "handling". Upon scanning the existing use of this API it was found that IsTerminating is often used as a fact - whether an exception is terminal or not. Changing the behavior to mean "configurable" will be a breaking change to those uses.

Risks

This APIs can be abused to ignore unhandled exceptions or fatal errors in scenarios where it is not warranted.

AaronRobinsonMSFT commented 6 months ago

FatalErrorHandlerResult FatalErrorHandler(int32_t hresult, struct FatalErrorInfo* data);

Defines int32_t, but the C# function pointer in SetFatalErrorHandler() defined uint. I assume both should be signed.

janvorli commented 6 months ago

I wonder if having the messages etc. as 16 bit strings is the right way. Our coreclr_xxx hosting APIs use 8 bit (UTF-8) strings so that users on Unix don't have to convert them using some external library. 16 bit strings are very unusual on Unix and on Windows, it is trivial to do the conversion if needed thanks to the Windows APIs.

VSadov commented 6 months ago

We had to choose between char_t and char16_t. See: https://github.com/dotnet/runtime/issues/42275#issuecomment-2073114120

The key observations were -

janvorli commented 6 months ago

In case the native handler on Unix would want to do anything sensible with these strings, even just writing them to a text log file, I think it would most likely need to convert them. In the runtime, we can easily convert them into stack allocated buffers without disturbing the process state. I still think it would be better if whatever string enters or leaves the runtime from the native side was 8 bit. Similar to the hosting APIs, it would add a little overhead for the handler on Windows, but much less than the overhead that would otherwise be needed on Unix. I am not sure I understand the comment in the other issue on a need to have some PAL definitions if we went with 8 bit strings.

AaronRobinsonMSFT commented 6 months ago

I am not sure I understand the comment in the other issue on a need to have some PAL definitions if we went with 8 bit strings.

The approach in question was to have a char_t that represented wchar_t on Windows and char on non-Windows. If we wanted to make it always UTF-8, I could get behind that. That seems rather annoying, but I do get it. My desire is to avoid any sort of PAL types that users would need to think about. Casting doesn't concern me, but encoding types in signatures or data structures would be a red flag to me.

VSadov commented 6 months ago

How complex is UTF-16 to UTF-8 conversion? Is there IO or allocations?

AaronRobinsonMSFT commented 6 months ago

How complex is UTF-16 to UTF-8 conversion? Is there IO or allocations?

There will be allocations for sure. The IO is nil or very limited if done from managed code if I recall. The native side trans-coding is also new and I believe avoids IO. It isn't the worst idea if it is just for logging. The tricky bit is if the callback comes back into managed though - that is my biggest concern.

jkotas commented 6 months ago

How complex is UTF-16 to UTF-8 conversion? Is there IO or allocations?

It can be ~100 - ~1000 lines depending on how optimized implementation you want. The problem is that there is no broadly available standardized method to do this conversion on non-Windows platforms, so everybody using this API on non-Windows would have to copy the conversion routine from somewhere.

As a (usability) test for this API, you can write a sample handler that produces .json that captures the failure details (similar to .json that is produced by native AOT crash handler today). You will hit the UTF16 conversion problem pretty much immediately.

The tricky bit is if the callback comes back into managed though - that is my biggest concern.

Calling managed implementation is not an option for the crash handlers.

VSadov commented 6 months ago

There will be allocations for sure. The IO is nil or very limited if done from managed code if I recall. The native side trans-coding is also new and I believe avoids IO. It isn't the worst idea if it is just for logging. The tricky bit is if the callback comes back into managed though - that is my biggest concern.

If a string comes from native assert (like from GC), running managed code might not be an option. I assumed the conversion can be done completely in native code.

AaronRobinsonMSFT commented 6 months ago

The problem is that there is no broadly available standardized method to do this conversion on non-Windows platforms

Right, they will likely need to use ICU or some other library.

The tricky bit is if the callback comes back into managed though - that is my biggest concern.

Calling managed implementation is not an option for the crash handlers.

Excellent

You will hit the UTF16 conversion problem pretty much immediately.

Fair. This implies UTF-8 should be the preference then.

VSadov commented 6 months ago

It can be ~100 - ~1000 lines depending on how optimized implementation you want. The problem is that there is no broadly available standardized method to do this conversion on non-Windows platforms, so everybody using this API on non-Windows would have to copy the conversion routine from somewhere.

That is not too bad. Since this is a crash handling and strings are not expected to be huge, I'd optimize for reliability. (avoiding IO, allocations, syscalls, if possible - the process might have crashed, but who knows what locks are still held and such...)

jkotas commented 6 months ago

Right, they will likely need to use ICU or some other library.

ICU is big, complicated library. It is not something you want to depend on in a crash handler. I think that the most appropriate solution would be to include source for the simplest possible conversion from somewhere.

vladimir-cheverdyuk-altium commented 6 months ago

Is it possible to get content of char16_t* exceptionString; // for unhandled managed exceptions the result of "ex.ToString()" by request? I believe that when FatalErrorHandler is called stack is still available.

I suggest to do it on demand because it will create issues for COR_E_OUTOFMEMORY and also we had an issue that @jkotas helped us with COM when calling Exception.ToString considerably slowdown future COM calls. See #98788.

VSadov commented 6 months ago

If conversion to UTF-8 is going to be needed in majority of scenarios (especially on Unix), I think we should do UTF-8 rather than expect that the end user will convert.

I'll update the proposal to have char8_t.

VSadov commented 6 months ago

Is it possible to get content of char16_t* exceptionString; // for unhandled managed exceptions the result of "ex.ToString()" by request? I believe that when FatalErrorHandler is called stack is still available.

ex.ToString() typically happens at the time when runtime finds that the exception is unhandled. By the time when the native crash handler runs, managed code would not be able to run. To have any chance to report that string it should be captured and stored earlier, while that is still possible. I do not think the string can be produced on request while handling a crash.

On the other hand, a fatal error can happen only once in the life of an app, so perhaps performance of this API will not be as critical as in scenarios that can repeat.

jkotas commented 6 months ago

@vladimir-cheverdyuk-altium has a good point about having an option to suppress computation of the textual stacktrace or doing other non-trivial operations if the crash handler is not interested in the information. Such non-trivial operations can hit secondary crashes or otherwise interfere with the diagnosability of the original unhandled exception.

I had similar concern in the original discussion.

am11 commented 6 months ago

they will likely need to use ICU or some other library.

Runtime's implementation (which is pretty much standalone) could also be an option for users.

e.g. this test code ```c #include #include #include size_t wstrlen(CHAR16_T* str) { size_t len = 0; while (str[len++]); return len; } void handleErrors() { if (errno == MINIPAL_ERROR_INSUFFICIENT_BUFFER) { printf ("Allocation failed (%d)", errno); abort(); } else if (errno == MINIPAL_ERROR_NO_UNICODE_TRANSLATION) { printf ("Illegal byte sequence encountered in the input. (%d)", errno); abort(); } } int main(void) { CHAR16_T wstr[] = u"ハローワールド! 👋 🌏"; int wlen = wstrlen(wstr); size_t mblen = minipal_get_length_utf16_to_utf8(wstr, wlen, 0); handleErrors(); char* mbstr = (char *)malloc((mblen + 1) * sizeof(char)); size_t written = minipal_convert_utf16_to_utf8 (wstr, wlen, mbstr, mblen, 0); handleErrors(); printf("Conversion completed. mblen: %zu, mbstr: %s\n", written, mbstr); return 0; } ```

can be built with sources acquired from tag link:

$ curl --create-dirs --output-dir external/minipal -sSLO \
    "https://raw.githubusercontent.com/dotnet/runtime/v8.0.4/src/native/minipal/{utf8.c,utf8.h,utils.h}"
$ clang -Iexternal test.c external/minipal/utf8.c -o testutf8
$ ./testutf8
Conversion completed. mblen: 33, mbstr: ハローワールド! 👋 🌏
VSadov commented 6 months ago

@vladimir-cheverdyuk-altium has a good point about having an option to suppress computation of the textual stacktrace or doing other non-trivial operations if the crash handler is not interested in the information. Such non-trivial operations can hit secondary crashes or otherwise interfere with the diagnosability of the original unhandled exception.

At the time of reporting the crash it would be too late to decide whether FailFast should have captured the exception message for the unhandled exception or not.

We could add a bool parameter to SetFatalErrorHandler that would have effect of suppressing that ex.ToString process-wide. The result of that would be that the default crash processing will not have the string either - i.e. will not print it to console.

The benefits of that would be slightly faster handling of the crash and more robust behavior in case ex.ToString causes secondary errors.

Cases like OOM or stack overflow are always tricky. Also ex.ToString can do anything - like throw this;. In most cases unhandled exceptions are just that - unhandled exceptions (null dereferences, file did not exist, ..) and getting exception message would be safe and useful.

Basically - we can add a knob to SetFatalErrorHandler to suppress ex.ToString when installing the handler, but I wonder is there will be a lot of users for that.

Is it really that often that ex.ToString can do really bad things (like stack overflow) and simply placing it inside try/catch will not make the scenario safe enough, that you'd rather prefer to not see stack traces?

vladimir-cheverdyuk-altium commented 6 months ago

@VSadov

At the time of reporting the crash it would be too late to decide whether FailFast should have captured the exception message for the unhandled exception or not. I'm not sure about FailFast but for regular exceptions I think exception stack should be available unless I'm missing something.

As it was stated in https://github.com/dotnet/runtime/issues/98788, ex.ToString will make COM calls slow if it was in the call stack and it is why I'm concerned.

VSadov commented 6 months ago

Perhaps I do not understand the scenario...

When a managed exception is unhandled(*), the runtime will call Environment_FailFast and will pass the exception as one of the parameters. https://github.com/dotnet/runtime/blob/6cd329b25315c5f21e86dac74f6cbe6d0deb6b0e/src/coreclr/classlibnative/bcltype/system.cpp#L168

Environment_FailFast will make a call to exception.ToString(). (via GetExceptionMessage). At that point we still can run managed code, but not for long. A few lines later Environment_FailFast will call EEPolicy::HandleFatalError, which will proceed to things like capturing a dump and exiting the process.

I understand the reliability concern about exception.ToString() doing something horrible, but not sure where the performance concern comes from. What will observe slowness in COM calls after Environment_FailFast?

(*) Unhandled here means - "really unhandled". If there was UnhandledExceptionHandler and handled it, the exception is not unhandled after that and will not FailFast/crash.

jkotas commented 6 months ago

I am not worried about performance. I am worried about providing the right information to the crash handler, without introducing unnecessary failure points between the failure point and the call of the crash handler.

On one end, you can have crash handlers that just want to capture crashdump for offline processing and do nothing else. Computing the textual stacktrace is source of unnecessary failure points for those.

On the other end, you can have crash handlers that want to capture many details like stacktrace with file and line numbers if possible.

The design should accommodate the whole spectrum. (We do not need to implement everything in the first round, but we should have an idea for how we would go about covering the whole spectrum.)

vladimir-cheverdyuk-altium commented 6 months ago

@VSadov My idea was to pass exception handle (it can be exception object itself) to handler and later handler can call function to get more information from this handle.

What will observe slowness in COM calls after Environment_FailFast?

COM calls slows down after ex.ToString called if some module is on the call stack. Some code that walks stack converts module to read/write mode from read-only mode and as result search became linear instead of binary.

VSadov commented 6 months ago

The design should accommodate the whole spectrum. (We do not need to implement everything in the first round, but we should have an idea for how we would go about covering the whole spectrum.)

I think the suggested API shape is sufficient for the first round. Then we can see how the API will be used and if/how it can be made better.

If we find that we really need to allow configuring ex.ToString() in FailFast, we can add an overload of SetFatalErrorHandler that takes an extra BOOL for that. Or we can pass an enum that can suppress any combination of optional parts in FatalErrorInfo.

Basically - I think there is a way to react to the need to configure, but maybe there is no need. In such case having a simpler API would be better.

jkotas commented 6 months ago

Or we can pass an enum that can suppress any combination of optional parts in FatalErrorInfo.

It assumes that the handler wants the fixed information for all types of crashes. It is not necessarily the case.

I am coming back to the callback idea: By default, the crash handler should only get information that does not require any extra memory allocations or substantial effort to compute. Everything else should be provided via callbacks. I think it is the most future proof and flexible design.

VSadov commented 6 months ago

Callbacks are more flexible, but may require that there is some state that can be interrogated. Right now we can store all the relevant info that we have to the stack and then invoke the handler while passing the pointers.

If the handler can come back for more info, we would need to store that info ahead of time somewhere (likely not on stack). We could defer some things like converting to UTF-8, but assuming that is relatively trivial, it may be not something that matters to defer.

Alternatively, we could provide the extra info by digging through the memory of the dead process - like debugger does. Maybe that would be better served by just exposing debugging APIs? An additional advantage for that would be that it might also work over a core dump, provided that debug info is available. This can certainly be the future direction where all this will be going.

As for configuring the behavior of ex.ToString via a callback I could see a scenario when FailFast could make a callback (managed or native?) and ask if ToString is desired.

This question would be asked only once per lifetime of the process, so there is not a lot of advantages over pre-configuring that via a bool parameter when setting up a handler. If concurrent FailFast is possible, the callback may fire more than once and a different answer could be provided, but I am not sure this is an advantage or not.

vladimir-cheverdyuk-altium commented 6 months ago

As for configuring the behavior of ex.ToString via a callback I could see a scenario when FailFast could make a callback (managed or native?) and ask if ToString is desired.

Maybe that callback can return flags that will specify what optional fields of FatalErrorInfo it should populate?

hez2010 commented 6 months ago

Can this proposal be updated to use a EventArgs that contains a GetDeferral() method to allow async handler?

public delegate void UnhandledExceptionHandler(UnhandledExceptionHandlerEventArgs e);

public sealed class UnhandledExceptionHandlerEventArgs
{
    private TaskCompletionSource? m_tcs;

    private sealed class UnhandledExceptionHandlerDeferral(TaskCompletionSource tcs) : IDisposable
    {
        public void Dispose() => tcs.TrySetResult();
    }

    public Exception Exception { get; }
    public IDisposable GetDeferral() => new UnhandledExceptionHandlerDeferral(m_tcs ??= new());
    public bool Handled { get; set; }
}

Usage 1: synchronized handler

void MyHandler(UnhandledExceptionHandlerEventArgs e)
{
    e.Handled = true;
}

Usage 2: asynchronized handler

async void MyHandler(UnhandledExceptionHandlerEventArgs e)
{
    using var deferral = e.GetDeferral();
    await MyAsyncHandlerMethod();
    e.Handled = true;
}
jkotas commented 6 months ago

There is nothing else that the thread dealing with the unhandled exception can do until the unhandled exception was dealt with, so there is no advantage in complicating this API with async. If you handler needs to call async methods, it should use Task.Wait.

VSadov commented 5 months ago

I've made changes to the fatal error handling API that allow querying for the error text if needed.

If the custom logger prints the provided text to console, it will achieve the same behavior as in the default handler. It could also add the text to a file, append to some binary storage, etc..

The same pattern may be followed in the future to allow querying for some other data, not necessarily textual.

jkotas commented 5 months ago

Nit: pfnLogAction will be called multiple times by the runtime. The runtime is not required to materialize the entire message in one long string.

VSadov commented 5 months ago

Yes, that may be desirable. I’ll modify the description to allow for sending that string in pieces.

VSadov commented 5 months ago

Updated for possibility of multiple calls to the pfnLogAction

AaronRobinsonMSFT commented 5 months ago
    void (*pfnGetFatalErrorLog)(
           FatalErrorInfo* errorData, 
           void (*pfnLogAction)(void *userContext, char8_t* logString), 
           void* userContext);

The above signature needs explicit calling conventions. I assume we want __stdcall on Windows x86 and the "default" everywhere else.

void (*pfnLogAction)(void *userContext, char8_t* logString),

The userContext variable is typically passed in last to the callback.

VSadov commented 5 months ago

The above signature needs explicit calling conventions. I assume we want __stdcall on Windows x86 and the "default" everywhere else.

How would it look in the declaration? Would that require #ifdef ?

VSadov commented 5 months ago

void (*pfnLogAction)(void *userContext, char8_t* logString),

The userContext variable is typically passed in last to the callback.

Could that make it awkward when more parameters need to be added? Maybe if adding an overload in v-next of this API or in an another entry point that logs something else. Would the other variants still have the context at the end?

AaronRobinsonMSFT commented 5 months ago

The above signature needs explicit calling conventions. I assume we want __stdcall on Windows x86 and the "default" everywhere else.

How would it look in the declaration? Would that require #ifdef ?

Yep. Something like the following:

#if defined(_MSC_VER) && defined(_M_IX86)
#define DOTNET_CALLBACK __stdcall
#else
#define DOTNET_CALLBACK 
#endif

Could that make it awkward when more parameters need to be added? Maybe if adding an overload in v-next of this API or in an another entry point that logs something else. Would the other variants still have the context at the end?

What other information are you envisioning that could evolve here? Adding additional arguments to a callback should be avoided. The best solution would be to create a new callback option rather than changing an existing callback API. The "context" is almost always last. I can't even think of a counter example, so this seems like "the pattern" to follow.

VSadov commented 5 months ago

Yep. Something like the following:

Would it be sufficient to just document it as:
"In all signatures of this API the default calling convention for the given platform is used."

I am not against the #ifdef just wonder if simply adding a comment is just as good.

VSadov commented 5 months ago

What other information are you envisioning that could evolve here? Adding additional arguments to a callback should be avoided. The best solution would be to create a new callback option rather than changing an existing callback API. The "context" is almost always last. I can't even think of a counter example, so this seems like "the pattern" to follow.

I could be easily convinced that the callback will be at the end. As the only truly required parameter in any kind of such pattern it would seem odd to put it at the end, but if that is very common, we can do that.

AaronRobinsonMSFT commented 5 months ago

"In all signatures of this API the default calling convention for the given platform is used."

That is possible, but that is ambigous on Windows. For example, the Windows OS is generally __stdcall, but VS defaults to __cdecl. Which means from a user perspective it might not even be something they have thought about in their project. My preference here is making the code as copy/paste friendly as possible.

VSadov commented 5 months ago

I've updated the signatures to add the #ifdef/__stdcall thing and made the context the last.

VSadov commented 5 months ago

Do we have enough of the API shape to move this to "ready for review"?

AaronRobinsonMSFT commented 5 months ago

I am ready to have a broader conversation. @jkotas?

jkotas commented 5 months ago

Same here.

rolfbjarne commented 5 months ago
public static void SetUnhandledExceptionHandler(UnhandledExceptionHandler handler);

The naming here seems confusing to me, this is named very similarly to the AppDomain.UnhandledException event, and in particular it doesn't seem clear that this won't be called for all unhandled exceptions, only unhandled exceptions that can actually be ignored.

It's not a big difference, but what about SetUnhandledExceptionFilter? That matches the imaginary code:

try { UserCode(); } catch (Exception ex) when handler(ex){};

in that handler is an exception filter.

IMHO it's also somewhat more obvious that this won't be called for all unhandled exceptions, only those that can actually be filtered/ignored.

A few other ideas, I don't like any of them, but maybe somebody else are inspired to come up with better names:

Additionally, it would be nice to be able to invoke the callback.

Example use case for .NET for iOS: we have an API called InvokeOnMainThread (Action action), where developers can do this:

InvokeOnMainThread (() => { ShowMessageBox ("Hello World!"); });

which would currently, if ShowMessageBox threw an exception, end up bringing the process down. If we could implement this internally like this:

public void InvokeOnMainThread (Action action)
{
    NativeInvokeOnMainThread (() => {
        // This is effectively called from a reverse P/Invoke
        try {
            action ():
        } catch (Exception ex) when ExceptionHandling.FilterUnhandledException (ex) {
            // ignored
        }
    });
}

that would be nice.

Lastly:

/// <exception cref="ArgumentNullException">If handler is null</exception>
/// <exception cref="InvalidOperationException">If a handler is already set</exception>
public static void SetUnhandledExceptionHandler(UnhandledExceptionHandler handler);

There's no way to clear/change/reset the handler once it's set? Is that intentional?

lambdageek commented 5 months ago

Does this proposal address crashes in unmanaged code? The main motivation in https://github.com/dotnet/runtime/issues/79706 is the ability for a user app (which is commonly a combination of managed code and native libraries) to install a third-party crash reporter behind the .NET runtime so that all fatal app terminations are captured and logged. This proposal seems to only deal with unhandled managed exceptions - or will SetFatalErrorHandler also install something for those cases? The proposal is fairly vague about what kinds of fatal errors will trigger the handler

jkotas commented 5 months ago

The naming here seems confusing to me, this is named very similarly to the AppDomain.UnhandledException event, and in particular it doesn't seem clear that this won't be called for all unhandled exceptions

Good point. I agree that the name variants like you have suggested can be used to make the behavior more clear.

Another option is to call it for all unhandled exceptions and change the filter to take an extra bool ignorable argument that will say whether the exception can be ignored.

Additionally, it would be nice to be able to invoke the callback.

It sound reasonable to me.

There's no way to clear/change/reset the handler once it's set? Is that intentional?

Yes, it is intentional. These callbacks are meant to be used at the applications level. The application is expected to be have one and only unhandled exception and fatal error handling policy.

Does this proposal address crashes in unmanaged code?

SetFatalErrorHandler is expected to address crashes in unmanaged code. Thinking about this more, there are two cases: fatal crashes in the .NET code (including unmanaged runtime code) and crashes in other code.

We clearly want this to intercept handle crashes in .NET code, but handling crashes in other code is less clear cut. It makes sense if the app wants to own the process, but it does not make sense if the .NET code is a library (e.g. think native AOT library) and the main app has its own crash handler. Are we going to limit this to handling crashes in .NET code or should we have a flag that says whether the handler should intercept all crashes - both in .NET code and other code?

colejohnson66 commented 4 months ago

Why a dedicated delegate type and not Func<Exception, bool>?

bartonjs commented 3 months ago

Video

namespace System.Runtime.ExceptionServices
{
    public static class ExceptionHandling
    {
        /// <summary>
        /// Sets a handler for unhandled exceptions.
        /// </summary>
        /// <exception cref="ArgumentNullException">If handler is null</exception>
        /// <exception cref="InvalidOperationException">If a handler is already set</exception>
        public static void SetUnhandledExceptionHandler(Func<Exception, bool> handler);

        /// <summary>
        /// .NET runtime is going to call `fatalErrorHandler` set by this method before its own
        /// fatal error handling (creating .NET runtime-specific crash dump, etc.).
        /// </summary>
        /// <exception cref="ArgumentNullException">If fatalErrorHandler is null</exception>
        /// <exception cref="InvalidOperationException">If a handler is already set</exception>
        public static unsafe void SetFatalErrorHandler(delegate* unmanaged<int, void*, int> handler);
    }
}