Allow ignoring unhandled exceptions in UnhandledException event handler

jkotas commented 4 years ago

Background and Motivation

Scenarios like designers or REPLs that host user provided code are not able to handle unhandled exceptions thrown by the user provided code. Unhandled exceptions on finalizer thread, threadpool threads or user created threads will take down the whole process. This is not desirable experience for these type of scenarios.

The discussion that lead to this proposal is in https://github.com/dotnet/runtime/issues/39587

Proposed API

Allow ignoring unhandled exceptions on threads created by the runtime from managed UnhandledException handler:

 namespace System
 {
     public class UnhandledExceptionEventArgs
     {
         // Existing property. Always true in .NET Core today. The behavior will be changed to:
+        // - `false` for exceptions that can be ignored (ie thread was created by the runtime)
+        // - `true` for exceptions that cannot be ignored (ie foreign thread or other situations when it is not reasonably possible to continue execution)
         // This behavior is close to .NET Framework 1.x behavior of this property.
         public bool IsTerminating { get; }

+        // The default value is false. The event handlers can set it to true to make
+        // runtime ignore the exception. It has effect only when IsTerminating is false.
+        // The documentation will come with usual disclaimer for bad consequences of ignoring exceptions
+        public bool Ignore { get; set; }
     }
 }

Usage Examples

AppDomain.CurrentDomain.UnhandledException += (sender, e) =>
{
    if (DesignMode && !e.IsTerminating)
    {
        DisplayException(e.ExceptionObject);
        e.Ignore = true;
    }
};

Alternative Designs

Unmanaged hosting API that enables this behavior. (CoreCLR has poorly documented and poorly tested configuration option for this today.)

Similar prior art:

Risks

This API can be abused to ignore unhandled exceptions in scenarios where it is not warranted.

Dotnet-GitSync-Bot commented 4 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

danmoseley commented 4 years ago

Is “Ignorable” a potential alternative to “ IsTerminating” name? I find it a little confusing

jkotas commented 4 years ago

IsTerminating is an existing property. We can certainly leave the existing property alone and introduce a new one with a better name.

danmoseley commented 4 years ago

Ah of course

terrajobst commented 4 years ago

Video

Looks good as proposed

namespace System
{
    public partial class UnhandledExceptionEventArgs : EventArgs
    {
        // Existing property.
        // public bool IsTerminating { get; }

        public bool Ignore { get; set; }
    }
}

cheverdyukv commented 4 years ago

Is it possible to know when it will be implemented? .NET 5.0 or perhaps in one of the updates. Or perhaps .NET 6.0?

jkotas commented 4 years ago

.NET 5.0 is done. We only ship critical bug fixes in servicing updates, no new features.

I am not sure whether the core .NET team will get to work on this in .NET 6.

Would you be interested in contributing the implementation yourself?

cheverdyukv commented 4 years ago

@jkotas After checking source code for quite some time, I feel that it should be done by someone really familiar with exception handling for different platforms. Let me explain why.

I found that C++ code function InternalUnhandledExceptionFilter_Worker is core of exception handling. Looks like this SEH code and as result it is Windows specific. I have no idea where is code for other platforms and I have no idea how exceptions handled there. I also found that Mono has different way to handle this. I don't know about Blazor and how it works there.

But even for Windows how is it possible to know what should I set for IsTerminating for different type of exceptions? I checked that Stack Overload does not call UnhandledException but Out Of Memory does. I feel like IsTerminating should be set to false for all types of exceptions because to me looks like they all could be suppressed using that obsolete flag. Perhaps it will be better to invite here somebody from team who knows more about exceptions to help us decide?

From what I read in comments this problem is definitely not easy to solve and has a lot of tricks and caveats . Perhaps it will be easier to revert to original proposition with passing some flag during host creation. That code is already there and it is already able to suppress exceptions.

Next problem that certain code rely on IsTerminating. For example:

src\libraries\System.Data.OleDb\src\System\Data\ProviderBase\DbConnectionPoolCounters.cs
src\libraries\System.Runtime.Caching\src\System\Runtime\Caching\MemoryCache.cs

And perhaps there could be some 3rd party code that could rely on this as well.

As you can see there are calls to Dispose when process is terminating. But now IsTerminating will be false for most cases but application will still be terminating if nobody set Ignore to true. As result certain code could break. Probability is low, but it could happen.

jkotas commented 4 years ago

I feel like IsTerminating should be set to false for all types of exceptions because to me looks like they all could be suppressed using that obsolete flag.

One specific case that cannot be possible suppressed are the unhandled exception on a foreign threads on Unix.

The existing obsolete hosting flag has zero testing. It is likely that it does not behave correctly in some cases.

Perhaps it will be better to invite here somebody from team who knows more about exceptions to help us decide?

cc @janvorli

original proposition with passing some flag during host creation

It would not make the problem with getting this right any easier.

Next problem that certain code rely on IsTerminating.

Good point. This will need careful thought.

cheverdyukv commented 4 years ago

One specific case that cannot be possible suppressed are the unhandled exception on a foreign threads on Unix.

Do you know best way to test it? I have feeling that these will not call unhandled exception at all including the same on Windows. For example in .NET Framework when some thread called .NET and there is exception that is unhandled, then it was passed to that environment as exception and that could be treated normally using standard SEH.

The existing obsolete hosting flag has zero testing. It is likely that it does not behave correctly in some cases.

I did check code and it looks like same (or quite similar) flag existed in .NET Framework. But I never used that flag and not sure how will it behave.

It would not make the problem with getting this right any easier.

Well I still working thru that code, but a have feeling that everything is done already :) except setting this flag of course. I just don't have time to do proper testing.

darkguy2008 commented 3 years ago

It seems this issue is also blocking an important plugin for Unreal Engine ( https://github.com/nxrighthere/UnrealCLR/issues/33 ) while this was moved to the Future milestone, I'd like to push this to be taken into consideration for .NET 6 or even .NET 7. It's kinda discouraging to see it stuck in that milestone that only god-knows-when will be done.

Considering the positive impact that project is making in the UE community as a whole, I think it's important to look at.

Excluding the UE topic, this issue is pretty valid to me in some apps I've developed in the past, too. Haven't needed it, but it would've been a good addition to some workarounds I've had to implement.

danmoseley commented 2 years ago

@jkotas should this be labeled up for grabs?

rseanhall commented 2 years ago

Next problem that certain code rely on IsTerminating.

Good point. This will need careful thought.

CancelEventArgs is almost always used in an event with the "-ing" suffix. For example, the CancelEventArgs is available in the Closing event not the Closed event. This suggests that the existing UnhandledException event is not the right place to expose the ability to ignore an exception, and a new HandlingUnhandledException event (probably with a better name) needs to be added instead.

Could an initial implementation only set a very small subset of exceptions as ignorable, and then future versions slowly expand which ones are ignorable? Theoretically, I would think this API could be added without ever setting any exceptions as ignorable and then people could slowly contribute which exceptions can be ignored.

jkotas commented 2 years ago

This suggests that the existing UnhandledException event is not the right place to expose the ability to ignore an exception, and a new HandlingUnhandledException event (probably with a better name) needs to be added instead.

Ok, I have flipped this back to API needs work.

jkotas commented 2 years ago

Thoughts on a good name and the exact shape of the API are welcomed.

rseanhall commented 2 years ago

Still not a great name.

Proposed API

Allow ignoring unhandled exceptions on threads created by the runtime from new managed UnhandledExceptionThrowing handler:

namespace System
{
    public class AppDomain
    {
        public event UnhandledExceptionThrowingEventHandler? UnhandledExceptionThrowing;
    }

    public delegate void UnhandledExceptionThrowingEventHandler(object sender, UnhandledExceptionThrowingEventArgs e);

    public class UnhandledExceptionThrowingEventArgs : EventArgs
    {
        public object ExceptionObject { get; }

        // - `true` for exceptions that can be ignored (ie thread was created by the runtime)
        // - `false` for exceptions that cannot be ignored (ie foreign thread or other situations when it is not reasonably possible to continue execution)
        public bool Ignorable { get; }

        // The default value is false. The event handlers can set it to true to make
        // runtime ignore the exception. It has effect only when Ignorable is false.
        // The documentation will come with usual disclaimer for bad consequences of ignoring exceptions
        public bool Ignore { get; set; }
    }
}

The exception will be reported in existing UnhandledException event, whether it's ignored or not. UnhandledExceptionEventArgs.IsTerminating will be false if the exception was ignorable and ignored.

Usage Examples

AppDomain.CurrentDomain.UnhandledExceptionThrowing += (sender, e) =>
{
    if (DesignMode && e.Ignorable)
    {
        DisplayException(e.ExceptionObject);
        e.Ignore = true;
    }
};

osexpert commented 2 years ago

I don't like Ignorable \ Ignore. Whoever handle the unhandled exception may not necessarily ignore it, maybe they will choose to not just take down the current process, but the whole OS as well. Or they could choose to ignore the unhandled exception. Point is, we only know that someone handled it. We don't know how they handled it. Ignoring it is just one way of handling it.

Better names: bool CanBeHandled; bool Handled;

himanshuz2 commented 1 year ago

We need this functionality.

nxrighthere commented 11 months ago

Could you please provide any insight into when we will have this in .NET?

jkotas commented 10 months ago

Could you please provide any insight into when we will have this in .NET?

@agocke This is the unhandled exception and fatal error handling scenario that I have mentioned to you. Do you think we will be able to work on it in .NET 9?

agocke commented 10 months ago

Yeah, let's try to get this done for .NET 9.

ghost commented 10 months ago

Tagging subscribers to this area: @vitek-karas, @agocke, @vsadov See info in area-owners.md if you want to be subscribed.

Issue Details

## Background and Motivation Scenarios like designers or REPLs that host user provided code are not able to handle unhandled exceptions thrown by the user provided code. Unhandled exceptions on finalizer thread, threadpool threads or user created threads will take down the whole process. This is not desirable experience for these type of scenarios. The discussion that lead to this proposal is in https://github.com/dotnet/runtime/issues/39587 ## Proposed API Allow ignoring unhandled exceptions on threads created by the runtime from managed UnhandledException handler: ```diff namespace System { public class UnhandledExceptionEventArgs { // Existing property. Always true in .NET Core today. The behavior will be changed to: + // - `false` for exceptions that can be ignored (ie thread was created by the runtime) + // - `true` for exceptions that cannot be ignored (ie foreign thread or other situations when it is not reasonably possible to continue execution) // This behavior is close to .NET Framework 1.x behavior of this property. public bool IsTerminating { get; } + // The default value is false. The event handlers can set it to true to make + // runtime ignore the exception. It has effect only when IsTerminating is false. + // The documentation will come with usual disclaimer for bad consequences of ignoring exceptions + public bool Ignore { get; set; } } } ``` ## Usage Examples ``` C# AppDomain.CurrentDomain.UnhandledException += (sender, e) => { if (DesignMode && !e.IsTerminating) { DisplayException(e.ExceptionObject); e.Ignore = true; } }; ``` ## Alternative Designs Unmanaged hosting API that enables this behavior. (CoreCLR has poorly documented and poorly tested configuration option for this today.) Similar prior art: - [`UnobservedTaskExceptionEventArgs.Observed`](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.unobservedtaskexceptioneventargs.observed) + [`UnobservedTaskExceptionEventArgs.SetObserved`](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.unobservedtaskexceptioneventargs.setobserved) - [`CancelEventArgs.Cancel`](https://docs.microsoft.com/en-us/dotnet/api/system.componentmodel.canceleventargs.cancel) ## Risks This API can be abused to ignore unhandled exceptions in scenarios where it is not warranted.

Author:	jkotas
Assignees:	-
Labels:	`api-needs-work`, `area-Host`
Milestone:	Future

VSadov commented 8 months ago

I want to pick this up and I am trying to figure where this ended last time.

What I see is:

There were some concerns about naming
The IsTerminating appears to be used in a few cases as a fact - whether an exception is terminal or not, so changing the meaning to mean "configurable" can be a breaking change to those uses.

Are these all the reasons why we wanted to rethink the API ?

jkotas commented 8 months ago

Are these all the reasons why we wanted to rethink the API ?

Yes, I think so.

VSadov commented 8 months ago

It feels like the part that we already have scenarios where IsTerminating is used to check for whether the exception is terminal or not, may require that we leave that alone and add a new event where listeners will have a chance to configure the outcome.

Basically

send a new TBD event and give a chance to the listeners to configure. Perhaps only send if it is possible to configure.
Then send the existing event with IsTerminating specifying what we are going to do next - terminate or not. The second event is basically just a notification. Too late to configure anything.

At least these are my thought right away for how to fit into existing scenarios.

VSadov commented 8 months ago

The use case would look like:

AppDomain.CurrentDomain.UnhandledExceptionQuery += (sender, e) =>
{
    if (DesignMode)
    {
        e.Terminate = false;
        DebugLog("trying to ignore: ", e.ExceptionObject);
    }
};

AppDomain.CurrentDomain.UnhandledException += (sender, e) =>
{
    if (!e.IsTerminating)
    {
        DisplayException(e.ExceptionObject);
    }
    else
    {
        WrapItUpWeAreGoingToCrash();
    }
};

public class UnhandledExceptionQueryEventArgs
{
     // defaults to true
     // setting false will cause the exception not be terminal
     // all listeners need to agree  (sadly, the order of listeners matters)
     public bool Terminate { get; set; }
     public object ExceptionObject { get; }
}

public class UnhandledExceptionEventArgs
{
     // Existing property. Always true in .NET Core today. Will be false if termination was overridden.
     public bool IsTerminating { get; }
     public object ExceptionObject { get; }
}

jkotas commented 8 months ago

AppDomain.CurrentDomain.UnhandledExceptionQuery

What about UnhandledExceptionHandler that returns boolean? If the handler returns true, the exception is considered handled and we are done. The existing AssemblyResolve events are prior art for shape like this.

Also, we may want to put this on a new type under System.Runtime.ExceptionServices where the other fatal error handlers going to be.

Then send the existing event with IsTerminating specifying what we are going to do next - terminate or not. The second event is basically just a notification. Too late to configure anything.

I am not sure about this part. IsTerminating behavior is poorly defined and it is always true (unless one uses the unsupported config switch). I would keep UnhandledException callback to be called only when we are guaranteed that the process is terminating.

VSadov commented 8 months ago

What about UnhandledExceptionHandler that returns boolean? If the handler returns true, the exception is considered handled and we are done. The existing AssemblyResolve events are prior art for shape like this.

How does that work with multiple listeners? The last wins?

jkotas commented 8 months ago

How does that work with multiple listeners? The last wins?

The first one wins. It is how AssemblyResolve and similar events work today. One example from many: https://github.com/dotnet/runtime/blob/aee49579769188d0ff7cf3ca872d2126e5bb3c70/src/libraries/System.Private.CoreLib/src/System/Runtime/Loader/AssemblyLoadContext.cs#L811-L821

VSadov commented 8 months ago

Right, Delegate.EnumerateInvocationList That would work.

VSadov commented 8 months ago

I would keep UnhandledException callback to be called only when we are guaranteed that the process is terminating.

Does that imply there is another new event for nonterminal unhandled exceptions or only the handlers up to the one that returned true will know about those?

I suppose the recommended use would be to not have multiple handlers, or at least make them handle different exceptions or be responsible for different scenarios. With that view, it might be ok that once exception "handled" noone else sees it.

jkotas commented 8 months ago

Does that imply there is another new event for nonterminal unhandled exceptions or only the handler that returned true will know about those?

Yes. (All UnhandledExceptionHandler's that were called before the one that handled it would know about it too of course.)

Separately, we may want to have an event that is triggered when an exception (any exception) is handled. We have AppDomain.FirstChanceException event that is triggered when exception is thrown, but we do not have one for handled exceptions. I think it would help with #98878.

I suppose the recommended use would be to not have multiple handlers, or at least make them handle different exceptions or be responsible for different scenarios.

Right. If we want to avoid conflicts between different handlers, we may want to only allow setting one per app. It would help with ensuring that the unhandled exception policy is only controlled at app level and that random libraries do not participate in it. NativeLibrary.SetDllImportResolver is an example of prior art like this.

VSadov commented 8 months ago

Separately, we may want to have an event that is triggered when an exception (any exception) is handled.

I suppose that includes the ordinary catch and the handler event. Also in rethrow case the same exceptions could be caught more than once.

I wonder if there is a need or even a possibility to identify the "catcher".

jkotas commented 8 months ago

I suppose that includes the ordinary catch and the handler event. Also in rethrow case the same exceptions could be caught more than once.

Right.

I wonder if there is a need or even a possibility to identify the "catcher".

I think that the Stacktrace APIs would be a solution for that. It is expensive to do it eagerly, for AOT in particular.

VSadov commented 8 months ago

So, for the unhandled exception handler we will have:

AppDomain.CurrentDomain.UnhandledExceptionHandler += (sender, e) =>
{
    if (DesignMode)
    {
        DisplayException(e.ExceptionObject);
        // the exception is now "handled"
        return true;
    }
};

AppDomain.CurrentDomain.UnhandledException += (sender, e) =>
{
    // IsTerminating is always true for unhandled exceptions  (assuming this is not .NET Fx)
    Debug.Assert(e.IsTerminating);
    WrapItUpWeAreGoingToCrash();
};

public delegate bool UnhandledExceptionHandlerEventHandler(object sender, System.UnhandledExceptionHandlerEventArgs e);

public class UnhandledExceptionHandlerEventArgs
{
     public object ExceptionObject { get; }
}

jkotas commented 8 months ago

Right. Open design decisions:

Add to AppDomain vs. introduce a new type under ExceptionServices
Event vs. a single shot delegate

VSadov commented 8 months ago

Looks like Mono has tests for IsTerminating==false. Is that working in Mono?

https://github.com/dotnet/runtime/blob/ee501fb2c6ac901b761131b8c4760f74b5c18a62/src/mono/mono/tests/threadpool-exceptions2.cs#L35

jkotas commented 8 months ago

runtime/src/mono/mono/tests/threadpool-exceptions2.cs

These are orphaned tests. You should check the actual behavior.

We do not have a lot of test coverage for unhandled exceptions in general so there can be untracked behavior differences between runtimes.

VSadov commented 8 months ago

I think either way, we can say that once UnhandledExceptionHandler returns true, this is no longer an unhandled case so CurrentDomain.UnhandledException is not called, regardless of the runtime.

Add to AppDomain vs. introduce a new type under ExceptionServices

I'd prefer ExceptionServices. This can be seen as orthogonal to CurrentDomain.UnhandledException, thus does not need to live near it.

Event vs. a single shot delegate

I think I might prefer a single shot delegate (the DllImportResolver style). Solves the problem with multiple handlers. Or at least moves it to the user, who still can build something pluggable or flow this into an event.

But I'd like to hear from the likely users.

VSadov commented 8 months ago

With above assumptions, the use case will be something like:

using System.Runtime.ExceptionServices;

ExceptionHandling.SetUnhandledExceptionHandler(
    (ex) =>
    {
        if (DesignMode)
        {
            DisplayException(ex);
            // the exception is now "handled"
            return true;
        }
    }
);

namespace System.Runtime.ExceptionServices
{
    public delegate bool UnhandledExceptionHandler(System.Exception exception);

    public static class ExceptionHandling
    {
        /// <summary>
        /// Sets a handler for unhandled exceptions.
        /// </summary>
        /// <exception cref="ArgumentNullException">If handler is null</exception>
        /// <exception cref="InvalidOperationException">If a handler is already set</exception>
        public static void SetUnhandledExceptionHandler(UnhandledExceptionHandler handler);
    }
}

jkotas commented 8 months ago

    // can be called multiple times - new handler replaces old.
    // calling with `null` unsets the handler

I think it is unnecessary flexibility - it does not prevent different libraries from fighting over who is going to win. NativeLibrary.SetDllImportResolver can be called exactly once for given assembly if you go with that as prior art.

VSadov commented 8 months ago

I think it is unnecessary flexibility - it does not prevent different libraries from fighting over who is going to win. NativeLibrary.SetDllImportResolver can be called exactly once for given assembly if you go with that as prior art.

I was mostly thinking that "unsetting" may be a desired scenario. Replacing is a side effect of allowing unsetting. - once you can unset, you'd want to be able to set again, and then why not to allow replacing. But outright swapping does feel a bit odd for a scenario.

It could certainly be a one-time API like the resolver. In most cases, I agree, it will be set once and set early - at the app startup or once the runtime is initialized (in hosted scenario like game scripting).

I'll update the example.

VSadov commented 8 months ago

For the semantics of unhandled exception handler I think we can follow the model of imaginary

try { UserCode(); } catch (Exception ex) when handler(ex){};

in places where the above will not lead to process termination regardless of what handler() returns.

only exceptions that can be caught and ignored will cause the handler to be invoked. (i.e. stack overflow will not)
an unhandled exception thrown in a handler will not invoke the handler, but will be treated as returning false.
when an exception is handled via a handler in a user-started thread, the thread will still exit (but not escalate to process termination)
when an exception is handled in a task-pumping scenario (thread pool, finalizer queue, anything else?), the pumping will continue. (we do not need to commit to what happens to the thread, but the process should be able to proceed)
a reverse pinvoke will not install the try/catch like above.
main() will not install the try/catch like above

Any other interesting scenario or a corner case?

jkotas commented 8 months ago

LGTM

jkotas commented 8 months ago

@joncham Does https://github.com/dotnet/runtime/issues/42275#issuecomment-2008339882 for your scenario?

joncham commented 8 months ago

a reverse pinvoke will not install the try/catch like above.

Does this mean an unhandled exception in a reverse pinvoke will not call the UnhandledExceptionHandler and the process will terminate? I am not sure if exceptions thrown in reverse pinvokes have a defined behavior today, but we would prefer to be able to handle as many cases as possible versus crashing the process.

main() will not install the try/catch like above

In this case, is the AppDomain UnhandledException event called? In general, is there any case where UnhandledExceptionHandler will not be called but UnhandledException is called?

It could certainly be a one-time API like the resolver. In most cases, I agree, it will be set once and set early - at the app startup or once the runtime is initialized (in hosted scenario like game scripting).

In our hosted scenario (Unity Editor) we would want to install the single handler, and not allow it to be replaced/overriden.

jkotas commented 8 months ago

I am not sure if exceptions thrown in reverse pinvokes have a defined behavior today

On Windows CoreCLR, exceptions escaping from reverse PInvokes are converted to Windows SEH exceptions. Exceptions escaping from reverse PInvokes are treated as unhandled exceptions everywhere else.

we would prefer to be able to handle as many cases as possible versus crashing the process.

To handle unhandled exception in reverse PInvoke, we would have to return something to the unmanaged code that called the reverse PInvoke. What would that be? Returning random values and hope for the best does not sound like a good plan.

VSadov commented 7 months ago

Now, for the API for intercepting fatal crashes - I am thinking about just allowing to plug into CrashDumpAndTerminateProcess.

The goal of this API is to allow 3rd party extension of intercepting fatal process crashes.

The actual handler must be in native code, since running managed code while crashing is not a good idea. In fact we may need to run this in a signal handler, so it would need to be signal-safe.

One prior suggestion for the API was in https://github.com/dotnet/runtime/issues/79706#issuecomment-1700243612

The API could be:

public static class ExceptionHandling
{
    // .NET runtime is going to call `fatalErrorHandler` set by this method before its own
    // fatal error handling (creating .NET runtime-specific crash dump, etc.). This can be only called once in given
    // process. 
    public static void SetFatalErrorHandler(delegate* unmanaged<uint, void> fatalErrorHandler);
}

It could really be just something that CrashDumpAndTerminateProcess calls before producing dump and terminating the process. It means that the signature would be basically the same as for CrashDumpAndTerminateProcess. More info could be added, but currently it is:

extern "C" DLL_EXPORT void __cdecl FatalErrorHandler(uint32_t exitCode)
{
      // native implementation with signal handler restrictions
}

Typical use would be something like:


    internal class Program
    {
        [UnmanagedCallersOnly]
        [DllImport("myCustomCrashHandler.dll")]
        public static extern void FatalErrorHandler(uint exitCode);

        unsafe static void Main(string[] args)
        {
            ExceptionHandling.SetFatalErrorHandler(&FatalErrorHandler);

            RunMyProgram();
        }
    }

In the same spirit as in the SetUnhandledExceptionHandler, setting the handler would be allowed just once per process.

Questions:

is the location/timing of this call sufficient for the purpose? It should be, since it would be called right before producing the .NET dump
any other info that could be helpful to the handler?
do we need the handler to communicate something back - like "do not do your dump" ? It would basically mean the handler may need to return nonvoid result.
https://github.com/dotnet/runtime/issues/79706#issuecomment-1700243612 also suggested GetIsManagedCode(). I am not sure how that would be helpful. Is there a back story to that?

jkotas commented 7 months ago

any other info that could be helpful to the handler?

We may want to pass in all information that is required to implement our own fatal error handler:

The OS specific signal handler arguments if the fatal crash is result of handling a signal or unhandled exception. siginfo_t *info and void *ucontext on Unix. EXCEPTION_POINTERS* on Windows that can be broken down into EXCEPTION_RECORD* ExceptionRecord and CONTEXT* ContextRecord to make it look more like Unix.
The error message if there is one, e.g. the exception error message or the error message passed into Environment.FailFast call.
Textual managed stack trace in some form, same information as what the runtime default handler prints to the console today. This may want to be optional.

This looks like too many pieces to pass as individual arguments. We may want to stash it all into a struct and pass the pointer to the struct into the handler.

is the location/timing of this call sufficient for the purpose?

The current fatal error processing does multiple things:

Prints error to the console
Logs event to system EventLog
Invokes crash reporter

We may want to provide more fine-grained control over all these steps or insert this callback as the very first step, before anything gets printed to the console.

79706 (comment) also suggested GetIsManagedCode(). I am not sure how that would be helpful. Is there a back story to that?

Yeah, the proposed GetIsManagedCode callback is not a good design, but the problem that it tried to solve it still there.

The problem you hit when implementing signal handlers on Unix is whether your signal handler should take over the process (works well for executables) or whether it should cooperate with other components that may be loaded in the process (works well for libraries).

If the component signal handler sees that the crashing IP is in the component code, it can assume that the component should handle it. The problem is with what to do if the component signal handler sees that the IP is in somebody else's code. Should it pass it to the previous signal handler (there may be none) or should it take over reporting it as a fatal crash?

VSadov commented 7 months ago

We may want to provide more fine-grained control over all these steps or insert this callback as the very first step, before anything gets printed to the console.

Right. If the idea is that the handler should be able to completely take over and do just its thing, we may also need a way for the handler to tell that the runtime's actions are not interesting - i.e. by returning false.

We would need to ensure that all the crash paths go through it though. CrashDumpAndTerminateProcess is a convenient choke point, but for the handler to take over, we would need to call it earlier. We may have to do some processing of the crash if we want to provide more info to the handler, but we will have to call it before we did anything observable to the end-user like printing to console or dumping files.

We may want to pass in all information that is required to implement our own fatal error handler

That is basically the stuff passed to EEPolicy::HandleFatalError. That was the other candidate from where to call the handler and pass all the info that is known at the time. The only confusing part is that not all crashing paths bypass EEPolicy::HandleFatalError and go directly to CrashDumpAndTerminateProcess, but I think it can be changed.

In a signal case we may need to call them earlier though.

This looks like too many pieces to pass as individual arguments. We may want to stash it all into a struct and pass the pointer to the struct into the handler.

Some of these pieces would be optional and will have default values, depending on scenario (i.e. AV vs. SO vs. intentional failfast).

I think having many arguments is ok. It might also be easier to add something to the argument list in a compatible way in V-next, if needed.

The problem you hit when implementing signal handlers on Unix is whether your signal handler should take over the process (works well for executables) or whether it should cooperate with other components that may be loaded in the process (works well for libraries).

I have only two ideas here:

the runtime needs to decide this too. So if runtime thinks it should be handled by something else, then we do not call the installed handler.
this is a very advanced scenario, similar to hosting.
Perhaps whoever works with this API should know whether this is an exe vs. library situation. It means we just call them and let them decide what to do, and if they return true we continue with our routine, which might end up calling another signal handler.

dotnet / runtime