[Mono] Using managed function pointers in non-llvm mixed AOT + interpreted environments leads to crashes

lambdageek commented 1 month ago

The IL opcodes ldftn / calli (for methods without an [UnmanagedCallersOnly] attribute) are compiled differently in mono in some scenarios:

In purely JITed and non-"llvmonly" AOT environments, ldftn returns a pointer to native code and calli is just an indirect call. (In generic sharing there is a trampoline involved to pass an additional runtime generic context that isn't part of the normal platform calling convention)
In "llvmonly" AOTed environments, ldftn returns a MonoFtnDesc* structure (used to pair up the code pointer and the rgctx arg and the MonoMethod*)and calli consumes such a thing and invokes the underlying method together with the extra arg).
In purely interpreted non-llvmonly environments, ldftn returns a (sometimes tagged) InterpMethod* pointer that calli uses to adjust the interpreter state and enter into the method.
In interpreted llvmonly environments we use MonoFtnDesc* for interpreted methods too, by also storing an InterpMethod* in the MonoFtnDesc* and making a decision about whether the method is AOT or interpreted.

The problem is that in a mixed AOT+interp non-"llvmonly" environment, we might use a combination of (1) and (3): for example we might AOT part of the app that includes an API like:

namespace MyFramework;
public class FrameworkClass
{
    public unsafe static int FrameworkMethod(delegate *<int, int> func) => func (1);
}

and we might call it from interpreted use code:

public class Program
{
    public static void Main()
    {
        unsafe
        {
              Console.WriteLine (MyFramework.FrameworkClass.FrameworkMethod (&Helper)); // prints 2
        }
    }

    private static int Helper(int n) => n + 1;
}

The problem is that when we interpret Program we will pass an InterpMethod* to FrameworkMethod whereas it expects a pointer to executable machine code, resulting in a hard to diagnose crash.

The reverse is also possible - we might be doing the ldftn in AOTed code (and get back a native code pointer), while the calli might be in interpreted code - which will expect an InterpMethod*.

In both cases the result is a crash

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @steveisok, @lambdageek See info in area-owners.md if you want to be subscribed.

lambdageek commented 1 month ago

FYI @BrzVlad

lambdageek commented 1 month ago

Not sure if we need to fix this right away or not. Since C# function pointers are unsafe, it seems like these kinds of mixed mode scenarios might be rare. Also it seems like "llvmonly" mixed mode wouldn't have a problem.

If we can't think of a way to fix the problem without sacrificing efficiency in AOT code, it would be nice to at least detect that something has gone wrong and throw an ExecutionEngineException rather than having a native crash.

kg commented 1 month ago

For targets that can JIT (Android, browser WASM), we could manufacture real ftn ptrs (trampolines?) on demand for interp code and use ftn ptrs everywhere, I think. I don't know what we'd do on iOS and WASI.

lambdageek commented 1 month ago

use ftn ptrs everywhere

the problem is that we'd then repeatedly enter/exit the interpreter (growing/shrinking the native stack). Currently the interpreter tries to be non-recursive and just manipulates InterpFrames in a loop.

kg commented 1 month ago

use ftn ptrs everywhere

the problem is that we'd then repeatedly enter/exit the interpreter (growing/shrinking the native stack). Currently the interpreter tries to be non-recursive and just manipulates InterpFrames in a loop.

we would probably want to attach information to the trampoline that points to the interpmethod... good point though, that's messy.

dotnet / runtime

[Mono] Using managed function pointers in non-llvm mixed AOT + interpreted environments leads to crashes #102891