dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

Cross-VM Exception Unwinding #108211

Open jonpryor opened 1 month ago

jonpryor commented 1 month ago

Background

Java.Interop is used to allow in-process interactions with a Java Virtual Machine (JVM). .NET for Android builds atop Java.Interop for integration with Android, using MonoVM as the JIT. (Neither CoreCLR nor NativeAot are supported by .NET for Android at this time.)

See also:

Setup

Assume we have a call stack wherein Managed code calls into Java code which calls back into Managed code:

// C#
void ManagedJavaManaged()
{
    int value   = 0;
    using var c = new MyIntConsumer(v => value = v);
    using var r = JavaInvoker.CreateRunnable(c);
    r.Run ();
}

partial class MyIntConsumer : Java.Lang.Object, Java.Util.Function.IIntConsumer {
    Action<int> action;

    public MyIntConsumer (Action<int> action)
    {
        this.action = action;
    }

    public void Accept (int value)
    {
        action (value);
    }
}

Java side of JavaInvoker:

// Java
public final class Invoker {
    public static Runnable createRunnable(final IntConsumer consumer) {
        return new Runnable() {
            int value;
            public void run() {
                consumer.accept(value++);
            }
        };
    }
}

The above demonstrates a Managed > Java > Managed transition:

  1. the originating Managed method is ManagedJavaManaged(), which invokes
  2. JavaInvoker.CreateRunnable(), which eventually calls the Java method Invoker.createRunnable(), which in turn calls consumer.accept(), transitioning back into
  3. Managed code in MyIntConsumer.Accept().

When no exceptions are involved, everything is fine!

But what if an exception is thrown? What if MyIntConsumer.Accept(int) throws an exception?

Exception Handling

Java code cannot directly invoke managed code. Instead, Java code invokes special native methods. We call these "marshal methods", which are responsible for marshaling parameters, return values, and (ideally) exceptions.

Exception Handling without a Debugger

When no Debugger is present, the marshal methods catch and marshal all exceptions:

partial class IIntConsumerInvoker : global::Java.Lang.Object, IIntConsumer {
    static void n_Accept_I (IntPtr jnienv, IntPtr native__this, int value)
    {
        var envp = new JniTransition(jnienv);
        try {
            var __this = global::Java.Lang.Object.GetObject<Java.Util.Functions.IIntConsumer>(jnienv, native__this, JniHandleOwnership.DoNotTransfer)!;
            __this.Accept(value);
        } catch (Exception e) {
            envp.SetPendingException(e);
        } finally {
            envp.Dispose();
        }
    }
}

This way, if MyIntConsumer.Accept() throws an exception, it will be caught within IIntConsumerInvoker.n_Accept_I(), marshaled to Java, and then the method will return. Execution is assumed to return to Java, Java will see the pending exception, raise it, and Java code will be able to participate in exception handling.

Exception Handling with a Debugger: catch everything

Things get "more interesting" when a debugger is attached, as we want our developer customers to see a "first chance exception" which points to the correct location.

If we run the following with a debugger attached:

/* 1 */ using var c = new MyIntConsumer(v => throw new InvalidOperationException());
/* 2 */ using var r = JavaInvoker.CreateRunnable(c);
/* 3 */ r.Run ();

then when line 3 is executed, we want the debugger to "break" on line 1, with a callstack that shows (more or less):

The problem is that if we use the IIntConsumerInvoker.n_Accept_I() marshal method described above, in which all exceptions are caught, then instead the debugger will "break" at JniEnvironment.InstanceMethods.CallVoidMethod(…), if anywhere at all, completely skipping the original throw site and it's Java callers.

This was not considered to be an acceptable developer experience.

Exception Handling with a Debugger: catch only when a debugger is attached

If we instead have a marshal method where the catch block uses a when clause:

partial class IIntConsumerInvoker : global::Java.Lang.Object, IIntConsumer {
    static void n_Accept_I (IntPtr jnienv, IntPtr native__this, int value)
    {
        var envp = new JniTransition(jnienv);
        try {
            var __this = global::Java.Lang.Object.GetObject<Java.Util.Functions.IIntConsumer>(jnienv, native__this, JniHandleOwnership.DoNotTransfer)!;
            __this.Accept(value);
        } catch (Exception e) when (!Debugger.IsAttached) {
            envp.SetPendingException(e);
        } finally {
            envp.Dispose();
        }
    }
}

then we have a better "unhandled exception" experience: the debugger breaks on the original throw site.

Yay!

The problem is that if the developer then continues execution (F5), then the process state may be corrupted. This is because with the catch (Exception e) when (!Debugger.IsAttached) construct, the catch block becomes "invisible". This means that the exception is never caught, there is never an opportunity to return to Java code, and MonoVM will unwind all the call stacks, the Java call stacks included. This can result in the JVM aborting the process.

Oops.

Exception Handling Performance

There is an added complication: the use of JniTransition laid out above is a lie. What actually happens is that the marshal methods contain no exception handling, and the marshal method usage is "wrapped" around JNINativeWrapper.CreateDelegate():

// From src/Mono.Android/obj/Debug/net9.0/android-35/mcw/Java.Util.Functions.IIntConsumer.cs
internal partial class IIntConsumerInvoker : global::Java.Lang.Object, IIntConsumer {
    static void n_Accept_I (IntPtr jnienv, IntPtr native__this, int value)
    {
        var __this = global::Java.Lang.Object.GetObject<Java.Util.Functions.IIntConsumer> (jnienv, native__this, JniHandleOwnership.DoNotTransfer)!;
        __this.Accept (value);
    }

    static Delegate? cb_accept_Accept_I_V;
    static Delegate GetAccept_IHandler ()
    {
        if (cb_accept_Accept_I_V == null)
            cb_accept_Accept_I_V = JNINativeWrapper.CreateDelegate (new _JniMarshal_PPI_V (n_Accept_I));
        return cb_accept_Accept_I_V;
    }
}

JNINativeWrapper.CreateDelegate() in turn uses System.Reflection.Emit to do a variant on the "catch (Exception) when (!Debugger.IsAttached)" approach. This in turn means that System.Reflection.Emit is required, and can impact app startup times by pulling in System.Reflection.Emit initialization and usage into app startup.

Request

What we would like is a construct to say "ignore this catch block when dealing with first chance exceptions, while executing this catch block when actually unwinding the stack."

An idea sketched out here is to add a StackUnwinding type:

namespace System.Runtime.ExceptionServices {
    public partial class StackUnwinding : Exception
    {
        public StackUnwinding(Exception innerException);
    }
}

catch(StackUnwinding) blocks would be "invisible" to first chance exception reporting.

This would allow our marshal methods to use it:

void JniBoundary(IntPtr jnienv, IntPtr self)
{
    var t = new JniTransition(jnienv);
    try {
        // code which may throw an exception
    }
    catch (StackUnwinding e) {
        t.SetPendingException(e.InnerException);
    }
    finally {
        t.Dispose();
    }
}

We could then update our generator to make use of the new type, which would also allow us to stop using JNINativeWrapper.CreateDelegate() entirely. This would allow for faster app startup (no System.Reflection.Emit usage) without permitting Cross-VM corruption issues.

steveisok commented 1 month ago

/cc @janvorli @vitek-karas

jkotas commented 1 month ago

This problem looks very similar to the problem addressed by https://github.com/dotnet/runtime/issues/103105. Can #103105 be used to solve your debugger experience problem as well?

jonpryor commented 1 month ago

@jkotas: I think that would! I was not aware of that new attribute. We'll need to prototype it and see if it behaves as desired.

Thanks!

jonpryor commented 1 month ago

@jkotas, @steveisok: I think [DebuggerDisableUserUnhandledExceptionsAttribute] + Debugger.BreakForUserUnhandledException() in #103105 will work, in that the following .NET 9 Console app behaves as I want: the debugger "breaks" within B() at the throw new InvalidOperationException() line, even though the exception is handled:

using System.Diagnostics;

Console.WriteLine("Hello, World!");
A();

[DebuggerDisableUserUnhandledExceptions]
void A()
{
    try
    {
        B();
    }
    catch (Exception e)
    {
        Debugger.BreakForUserUnhandledException(e);
        Console.WriteLine(e.ToString());
    }
}

void B()
{
    throw new InvalidOperationException();
}

@steveisok, @thaystg: the problem is that [DebuggerDisableUserUnhandledExceptionsAttribute] + Debugger.BreakForUserUnhandledException() do not work with MonoVM in a .NET for Android 9 project. If I debug the following .NET for Android app:

namespace Scratch.AndroidApp_DisableUserUnhandledException
{
    [Activity(Label = "@string/app_name", MainLauncher = true)]
    public class MainActivity : Activity
    {
        protected override void OnCreate(Bundle? savedInstanceState)
        {
            base.OnCreate(savedInstanceState);

            // Set our view from the "main" layout resource
            SetContentView(Resource.Layout.activity_main);

            A(); 
        }

        [System.Diagnostics.DebuggerDisableUserUnhandledExceptions]
        static void A()
        {
            try
            {
                B();
            }
            catch (Exception e)
            {
                System.Diagnostics.Debugger.BreakForUserUnhandledException(e);
                Console.WriteLine(e.ToString());
            }
        }

        static void B()
        {
            throw new InvalidOperationException();
        }
    }
}

the debugger does not break within B().

What would be involved in making Debugger.BreakForUserUnhandledException() and related work on MonoVM? Could this be done for .NET 10?

lambdageek commented 1 month ago

/cc @tommcdon

jkotas commented 1 month ago

What would be involved in making Debugger.BreakForUserUnhandledException()

Debugger.BreakForUserUnhandledException is an empty method for a debugger to set a breakpoint at (check the implementation in https://github.com/dotnet/runtime/pull/104813/). There is no runtime specific support to make this work.

What is the debugger engine used to debug Mono on Android? It needs to be fixed there.

thaystg commented 1 month ago

@jonpryor Did you test it using VS or VSCode? Can you please test it in VSCode with the new debugger engine enabled? Because it uses the same debugger engine used to debug a coreclr app, so it may probably work?

jonpryor commented 1 month ago

@thaystg: I was testing using "Visual Studio 17.12 P3" on devbox, using whatever was current for "main" circa yesterday morning.

Is Visual Studio not using the correct debugger engine?

thaystg commented 1 month ago

@jonpryor Yes, this is correct, VSCode uses the same debugger engine used by coreclr debug, and VS uses the older one (debugger-libs), it will be replaced soon.

jonpryor commented 1 month ago

@thaystg requested:

Can you please test it in VSCode with the new debugger engine enabled?

It does not work as desired:

image

Note that the debugger "breaks" at the Debugger.BreakForUserUnhandledException() location within A(), not the throw new InvalidOperationException() location within B().

Compare with Visual Studio + .NET 9 + console app, which "breaks" at the original throw new InvalidOperationException() site within B():

image

Of equal importance is the exception callstack: VS + net9 + Console app shows the original callstack:

image

Compare to <Cannot evaluate the exception stack trace> within the VSCode screenshot. Additionally, if I hover my mouse over the topmost Scratch.AndroidApp-DisableUser… frame within CALL STACK (lower-right corner of VScode screenshot), it shows A(), not B().

thaystg commented 1 month ago

Thanks a lot, I was also able to reproduce it in a console app running on mono runtime using the new debugger engine. For sure we can fix it or .net 10.