dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15k stars 4.67k forks source link

Cross-VM Exception Unwinding #108211

Open jonpryor opened 2 hours ago

jonpryor commented 2 hours ago

Background

Java.Interop is used to allow in-process interactions with a Java Virtual Machine (JVM). .NET for Android builds atop Java.Interop for integration with Android, using MonoVM as the JIT. (Neither CoreCLR nor NativeAot are supported by .NET for Android at this time.)

See also:

Setup

Assume we have a call stack wherein Managed code calls into Java code which calls back into Managed code:

// C#
void ManagedJavaManaged()
{
    int value   = 0;
    using var c = new MyIntConsumer(v => value = v);
    using var r = JavaInvoker.CreateRunnable(c);
    r.Run ();
}

partial class MyIntConsumer : Java.Lang.Object, Java.Util.Function.IIntConsumer {
    Action<int> action;

    public MyIntConsumer (Action<int> action)
    {
        this.action = action;
    }

    public void Accept (int value)
    {
        action (value);
    }
}

Java side of JavaInvoker:

// Java
public final class Invoker {
    public static Runnable createRunnable(final IntConsumer consumer) {
        return new Runnable() {
            int value;
            public void run() {
                consumer.accept(value++);
            }
        };
    }
}

The above demonstrates a Managed > Java > Managed transition:

  1. the originating Managed method is ManagedJavaManaged(), which invokes
  2. JavaInvoker.CreateRunnable(), which eventually calls the Java method Invoker.createRunnable(), which in turn calls consumer.accept(), transitioning back into
  3. Managed code in MyIntConsumer.Accept().

When no exceptions are involved, everything is fine!

But what if an exception is thrown? What if MyIntConsumer.Accept(int) throws an exception?

Exception Handling

Java code cannot directly invoke managed code. Instead, Java code invokes special native methods. We call these "marshal methods", which are responsible for marshaling parameters, return values, and (ideally) exceptions.

Exception Handling without a Debugger

When no Debugger is present, the marshal methods catch and marshal all exceptions:

partial class IIntConsumerInvoker : global::Java.Lang.Object, IIntConsumer {
    static void n_Accept_I (IntPtr jnienv, IntPtr native__this, int value)
    {
        var envp = new JniTransition(jnienv);
        try {
            var __this = global::Java.Lang.Object.GetObject<Java.Util.Functions.IIntConsumer>(jnienv, native__this, JniHandleOwnership.DoNotTransfer)!;
            __this.Accept(value);
        } catch (Exception e) {
            envp.SetPendingException(e);
        } finally {
            envp.Dispose();
        }
    }
}

This way, if MyIntConsumer.Accept() throws an exception, it will be caught within IIntConsumerInvoker.n_Accept_I(), marshaled to Java, and then the method will return. Execution is assumed to return to Java, Java will see the pending exception, raise it, and Java code will be able to participate in exception handling.

Exception Handling with a Debugger: catch everything

Things get "more interesting" when a debugger is attached, as we want our developer customers to see a "first chance exception" which points to the correct location.

If we run the following with a debugger attached:

/* 1 */ using var c = new MyIntConsumer(v => throw new InvalidOperationException());
/* 2 */ using var r = JavaInvoker.CreateRunnable(c);
/* 3 */ r.Run ();

then when line 3 is executed, we want the debugger to "break" on line 1, with a callstack that shows (more or less):

The problem is that if we use the IIntConsumerInvoker.n_Accept_I() marshal method described above, in which all exceptions are caught, then instead the debugger will "break" at JniEnvironment.InstanceMethods.CallVoidMethod(…), if anywhere at all, completely skipping the original throw site and it's Java callers.

This was not considered to be an acceptable developer experience.

Exception Handling with a Debugger: catch only when a debugger is attached

If we instead have a marshal method where the catch block uses a when clause:

partial class IIntConsumerInvoker : global::Java.Lang.Object, IIntConsumer {
    static void n_Accept_I (IntPtr jnienv, IntPtr native__this, int value)
    {
        var envp = new JniTransition(jnienv);
        try {
            var __this = global::Java.Lang.Object.GetObject<Java.Util.Functions.IIntConsumer>(jnienv, native__this, JniHandleOwnership.DoNotTransfer)!;
            __this.Accept(value);
        } catch (Exception e) when (!Debugger.IsAttached) {
            envp.SetPendingException(e);
        } finally {
            envp.Dispose();
        }
    }
}

then we have a better "unhandled exception" experience: the debugger breaks on the original throw site.

Yay!

The problem is that if the developer then continues execution (F5), then the process state may be corrupted. This is because with the catch (Exception e) when (!Debugger.IsAttached) construct, the catch block becomes "invisible". This means that the exception is never caught, there is never an opportunity to return to Java code, and MonoVM will unwind all the call stacks, the Java call stacks included. This can result in the JVM aborting the process.

Oops.

Exception Handling Performance

There is an added complication: the use of JniTransition laid out above is a lie. What actually happens is that the marshal methods contain no exception handling, and the marshal method usage is "wrapped" around JNINativeWrapper.CreateDelegate():

// From src/Mono.Android/obj/Debug/net9.0/android-35/mcw/Java.Util.Functions.IIntConsumer.cs
internal partial class IIntConsumerInvoker : global::Java.Lang.Object, IIntConsumer {
    static void n_Accept_I (IntPtr jnienv, IntPtr native__this, int value)
    {
        var __this = global::Java.Lang.Object.GetObject<Java.Util.Functions.IIntConsumer> (jnienv, native__this, JniHandleOwnership.DoNotTransfer)!;
        __this.Accept (value);
    }

    static Delegate? cb_accept_Accept_I_V;
    static Delegate GetAccept_IHandler ()
    {
        if (cb_accept_Accept_I_V == null)
            cb_accept_Accept_I_V = JNINativeWrapper.CreateDelegate (new _JniMarshal_PPI_V (n_Accept_I));
        return cb_accept_Accept_I_V;
    }
}

JNINativeWrapper.CreateDelegate() in turn uses System.Reflection.Emit to do a variant on the "catch (Exception) when (!Debugger.IsAttached)" approach. This in turn means that System.Reflection.Emit is required, and can impact app startup times by pulling in System.Reflection.Emit initialization and usage into app startup.

Request

What we would like is a construct to say "ignore this catch block when dealing with first chance exceptions, while executing this catch block when actually unwinding the stack."

An idea sketched out here is to add a StackUnwinding type:

namespace System.Runtime.ExceptionServices {
    public partial class StackUnwinding : Exception
    {
        public StackUnwinding(Exception innerException);
    }
}

catch(StackUnwinding) blocks would be "invisible" to first chance exception reporting.

This would allow our marshal methods to use it:

void JniBoundary(IntPtr jnienv, IntPtr self)
{
    var t = new JniTransition(jnienv);
    try {
        // code which may throw an exception
    }
    catch (StackUnwinding e) {
        t.SetPendingException(e.InnerException);
    }
    finally {
        t.Dispose();
    }
}

We could then update our generator to make use of the new type, which would also allow us to stop using JNINativeWrapper.CreateDelegate() entirely. This would allow for faster app startup (no System.Reflection.Emit usage) without permitting Cross-VM corruption issues.

steveisok commented 2 hours ago

/cc @janvorli @vitek-karas

jkotas commented 1 hour ago

This problem looks very similar to the problem addressed by https://github.com/dotnet/runtime/issues/103105. Can #103105 be used to solve your debugger experience problem as well?

jonpryor commented 1 hour ago

@jkotas: I think that would! I was not aware of that new attribute. We'll need to prototype it and see if it behaves as desired.

Thanks!