dotnet / runtimelab

This repo is for experimentation and exploring new ideas that may or may not make it into the main dotnet/runtime repo.
MIT License
1.36k stars 188 forks source link

[NativeAOT-LLVM] Delegates marshalling is broken in some scenarios #2624

Open maxkatz6 opened 3 days ago

maxkatz6 commented 3 days ago

Running SkiaSharp from NativeAOT-LLVM I noticed that it always fails on DllImports with managed delegates.

dotnet.native.js:59 
 Uncaught 
RuntimeError: null function or function signature mismatch
    at dotnet.native.wasm.S_P_CoreLib_System_Runtime_ThunkBlocks__GetNewThunksBlock (dotnet.native.wasm:0x1ea0c5)
    at dotnet.native.wasm.S_P_CoreLib_System_Runtime_ThunksHeap___ctor (dotnet.native.wasm:0x132e2a)
    at dotnet.native.wasm.S_P_CoreLib_System_Runtime_ThunksHeap__CreateThunksHeap (dotnet.native.wasm:0x1a9576)
    at dotnet.native.wasm.S_P_CoreLib_System_Runtime_InteropServices_PInvokeMarshal__AllocateThunk (dotnet.native.wasm:0xef0c0)
    at dotnet.native.wasm.S_P_CoreLib_System_Runtime_CompilerServices_ConditionalWeakTable_2_CreateValueCallback<System___Canon__System___Canon>__InvokeOpenStaticThunk (dotnet.native.wasm:0xef030)
    at dotnet.native.wasm.S_P_CoreLib_System_Runtime_CompilerServices_ConditionalWeakTable_2<System___Canon__System___Canon>__GetValueLocked (dotnet.native.wasm:0x685ee)
    at dotnet.native.wasm.S_P_CoreLib_System_Runtime_CompilerServices_ConditionalWeakTable_2<System___Canon__System___Canon>__GetValue (dotnet.native.wasm:0x225430)
    at dotnet.native.wasm.S_P_CoreLib_System_Runtime_InteropServices_PInvokeMarshal__GetFunctionPointerForDelegate (dotnet.native.wasm:0x15c542)
    at dotnet.native.wasm.S_P_CoreLib_System_Runtime_InteropServices_Marshal__GetFunctionPointerForDelegate (dotnet.native.wasm:0x127a9e)
    at dotnet.native.wasm.S_P_CoreLib_System_Runtime_InteropServices_Marshal__GetFunctionPointerForDelegate_0<System___Canon> (dotnet.native.wasm:0xe40e5)

And more stacktrace:

$S_P_CoreLib_System_Runtime_ThunkBlocks__GetNewThunksBlock | @ | dotnet.native.wasm:0x1ea0c5
-- | -- | --
  | $S_P_CoreLib_System_Runtime_ThunksHeap___ctor | @ | dotnet.native.wasm:0x132e2a
  | $S_P_CoreLib_System_Runtime_ThunksHeap__CreateThunksHeap | @ | dotnet.native.wasm:0x1a9576
  | $S_P_CoreLib_System_Runtime_InteropServices_PInvokeMarshal__AllocateThunk | @ | dotnet.native.wasm:0xef0c0
  | $S_P_CoreLib_System_Runtime_CompilerServices_ConditionalWeakTable_2_CreateValueCallback<System___Canon__System___Canon>__InvokeOpenStaticThunk | @ | dotnet.native.wasm:0xef030
  | $S_P_CoreLib_System_Runtime_CompilerServices_ConditionalWeakTable_2<System___Canon__System___Canon>__GetValueLocked | @ | dotnet.native.wasm:0x685ee
  | $S_P_CoreLib_System_Runtime_CompilerServices_ConditionalWeakTable_2<System___Canon__System___Canon>__GetValue | @ | dotnet.native.wasm:0x225430
  | $S_P_CoreLib_System_Runtime_InteropServices_PInvokeMarshal__GetFunctionPointerForDelegate | @ | dotnet.native.wasm:0x15c542
  | $S_P_CoreLib_System_Runtime_InteropServices_Marshal__GetFunctionPointerForDelegate | @ | dotnet.native.wasm:0x127a9e
  | $S_P_CoreLib_System_Runtime_InteropServices_Marshal__GetFunctionPointerForDelegate_0<System___Canon> | @ | dotnet.native.wasm:0xe40e5
  | $Internal_CompilerGenerated__Module___<ManagedToNative>SkiaSharp_SkiaSharp_SKManagedStreamDelegates | @ | dotnet.native.wasm:0x153ecf
  | $SkiaSharp_SkiaSharp_SkiaApi__sk_managedstream_set_procs | @ | dotnet.native.wasm:0x118892
  | $SkiaSharp_SkiaSharp_SKAbstractManagedStream___cctor | @ | dotnet.native.wasm:0xd5767
  | $S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner__EnsureClassConstructorRun | @ | dotnet.native.wasm:0x214632
  | $S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner__CheckStaticClassConstructionReturnNonGCStaticBase | @ | dotnet.native.wasm:0x967cc
  | $__GetNonGCStaticBase_SkiaSharp_SkiaSharp_SKAbstractManagedStream | @ | dotnet.native.wasm:0x250838
  | $SkiaSharp_SkiaSharp_SKAbstractManagedStream___ctor_0 | @ | dotnet.native.wasm:0x120169
  | $SkiaSharp_SkiaSharp_SKManagedStream___ctor_0 | @ | dotnet.native.wasm:0x1200a5
  | $naot_llvm_demo_Program__Main_d__0__MoveNext | @ | dotnet.native.wasm:0x5c3c3

The same delegates are properly marshalled with plain NativeAOT on desktop, as well as Mono WASM. This issue might be or might not be related to https://github.com/mono/SkiaSharp/issues/1931.

Workaround - rewrite SkiaSharp to use function pointers instead (I am working on it right now).

Minimal repro (only SkiaSharp Bitmap decoding + HttpClient for demo): naot-llvm-demo.zip

maxkatz6 commented 3 days ago

To add some contexts, on how these delegates are used in SkiaSharp.

  1. DllImport is defined with managed delegates as parameters: https://github.com/mono/SkiaSharp/blob/463dd820278ad9635b0d83f4f9f4984c09510a6d/binding/HarfBuzzSharp/HarfBuzzApi.generated.cs#L1383
  2. Static callbacks are defined with MonoPInvokeCallback (so, no UnmanagedCallersOnly here). https://github.com/mono/SkiaSharp/blob/463dd820278ad9635b0d83f4f9f4984c09510a6d/binding/HarfBuzzSharp/DelegateProxies.cs#L33
  3. Delegates are pre-defined and statically stored: https://github.com/mono/SkiaSharp/blob/463dd820278ad9635b0d83f4f9f4984c09510a6d/binding/HarfBuzzSharp/DelegateProxies.cs#L17
  4. DllImport is called with pre-defined delegates: https://github.com/mono/SkiaSharp/blob/463dd820278ad9635b0d83f4f9f4984c09510a6d/binding/HarfBuzzSharp/Face.cs#L43-L46

And another use case is with delegates as structs field (stacktrace from the first message is failing on this one):

  1. Defined https://github.com/mono/SkiaSharp/blob/463dd820278ad9635b0d83f4f9f4984c09510a6d/binding/SkiaSharp/SkiaApi.generated.cs#L14372
  2. And used like here https://github.com/mono/SkiaSharp/blob/463dd820278ad9635b0d83f4f9f4984c09510a6d/binding/SkiaSharp/SKAbstractManagedStream.cs#L18C13-L18C25
SingleAccretion commented 3 days ago

This was also discussed on Discord as the problem hit on Avalonia bring-up.

I have since looked into what it would take to implement the thunk pool on Browser. It is possible in principle, but it doesn't fit into the shape of the existing code well because we need to know the exact signatures for Emscripten's addFunction. In any case, it will have quite suboptimal performance (addFunction relies on creating a whole new WASM module and Jitting it on the fly).

Not that we shouldn't implement it, of course.

jkotas commented 2 days ago

quite suboptimal performance (addFunction relies on creating a whole new WASM module and Jitting it on the fly).

A solution with these properties is not very compatible with native aot design principles.

SingleAccretion commented 2 days ago

A solution with these properties is not very compatible with native aot design principles.

Indeed. Perhaps, what we could do instead is:

1) At compile time, collect all thunk signatures that may be needed, and construct a 'template' WASM module with exports with all those signatures. These exports would function as our "thunks", with the same functionality. 2) Also construct the means to map native interop stubs to exports with a compatible signature in this template module. 3) At runtime, extract this WASM module, compile, and instantiate as needed, allocating memory for the thunk contexts and so on. 4) Pass the appropriate instantiated export to addFunction, which at that point boils down to simply adding an entry into the indirect function table.

The advantage is that instantiating a module does not require a lot of resources in the engine, since the compiled code is shared between instantiations (in .NET terms, modules are 'domain-neutral' 😄).

The disadvantage of the scheme as detailed above is that if we need only one kind of thunk signature a lot, the module with all of the signatures will be repeatedly instantiated. I don't know if the instantiation cost scales with the code size of the module (it obviously does scale with the size of the data unique to the module). If it turns out to be a problem, we can instead have one module per one signature (and thus export). That would lead to more modules instantiated in the usual case, however.

jkotas commented 2 days ago

I would say that the libraries have to convert to use function pointers if they want to use native AOT w/ wasm. It is not that hard to do and guaranteed to have the desired perf characteristics.

Mono does not implement full delegate marshalling for wasm either. Is that correct?

SingleAccretion commented 2 days ago

Mono does not implement full delegate marshalling for wasm either. Is that correct?

Yes. It implements enough for a certain narrow scenario - open static delegate pointing to an adorned method - to work. Here's the change implementing this happy path: https://github.com/dotnet/runtime/pull/38932.

The PI table task generates a statically-sized number of native functions that are, though a couple of layers, returned by GetFunctionPointerForDelegate:

typedef void (*WasmInterpEntrySig_3) (int*, int*, int*, int*, int*, int*, int*, int*);
int32_t wasm_native_to_interp_System_Private_CoreLib_ComponentActivator_GetFunctionPointer (void * arg0, void * arg1, void * arg2, void * arg3, void * arg4, void * arg5) { 
  int32_t res;
  if (!(WasmInterpEntrySig_3)wasm_native_to_interp_ftndescs [3].func) {
   mono_wasm_marshal_get_managed_wrapper ("System.Private.CoreLib", "Internal.Runtime.InteropServices.ComponentActivator", "GetFunctionPointer", 6);
  }
  ((WasmInterpEntrySig_3)wasm_native_to_interp_ftndescs [3].func) ((int*)&res, (int*)&arg0, (int*)&arg1, (int*)&arg2, (int*)&arg3, (int*)&arg4, (int*)&arg5, wasm_native_to_interp_ftndescs [3].arg);
  return res;
}

typedef void (*WasmInterpEntrySig_4) (int*, int*, int*);
void wasm_native_to_interp_System_Private_CoreLib_CalendarData_EnumCalendarInfoCallback (void * arg0, void * arg1) { 
  if (!(WasmInterpEntrySig_4)wasm_native_to_interp_ftndescs [4].func) {
   mono_wasm_marshal_get_managed_wrapper ("System.Private.CoreLib", "System.Globalization.CalendarData", "EnumCalendarInfoCallback", 2);
  }
  ((WasmInterpEntrySig_4)wasm_native_to_interp_ftndescs [4].func) ((int*)&arg0, (int*)&arg1, wasm_native_to_interp_ftndescs [4].arg);
}

typedef void (*WasmInterpEntrySig_5) (int*);
void wasm_native_to_interp_System_Private_CoreLib_ThreadPool_BackgroundJobHandler () { 
  if (!(WasmInterpEntrySig_5)wasm_native_to_interp_ftndescs [5].func) {
   mono_wasm_marshal_get_managed_wrapper ("System.Private.CoreLib", "System.Threading.ThreadPool", "BackgroundJobHandler", 0);
  }
  ((WasmInterpEntrySig_5)wasm_native_to_interp_ftndescs [5].func) (wasm_native_to_interp_ftndescs [5].arg);
}
jkotas commented 2 days ago

These examples are actually UnmanagedCallersOnly methods used with function pointers. I guess Marshal.GetFunctionPointerForDelegate happens to work too given the Mono internal implementation details (delegate marshaling and UnmanagedCallersOnly are not layered like it is in CoreCLR/NAOT).

SingleAccretion commented 2 days ago

I guess Marshal.GetFunctionPointerForDelegate happens to work too given the Mono internal implementation details (delegate marshaling and UnmanagedCallersOnly are not layered like it is in CoreCLR/NAOT).

More or less. The delegate part of this scenario is enabled via MonoPInvokeCallbackAttribute.