dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.19k stars 1.57k forks source link

Build infrastructure to allow changing implementation of native calls to use FFI calls instead of runtime calls #43889

Open mkustermann opened 3 years ago

mkustermann commented 3 years ago

Native methods in Dart are currently using our runtime call mechanism to call from Dart into C code. There is a significant performance overhead due to this: The native dart function cannot be inlined, we go indirectly through stubs, we box all arguments and make an array on the stack (in order to provide the C function a pointer to an array of arguments) ...

To avoid some of this overhead, we would like to enable calling natives using the existing FFI calling mechanism. That would allow us to pass unboxed primitives (e.g. integers/doubles) directly using C calling convention as well as dart objects via auto-wrapping in handles.

We want to prototype this on natives in Dart's core libraries and later make Flutter use it in dart:ui for faster calls into C (e.g. skia calls).

This will require adding a symbol-lookup mechanism (either declaratively as we do with natives right now, or imperatively) to avoid depending on VM/Embedder symbols to be available at runtime.

Some tasks related to the Ffi Native support:

/cc @mraleph

mkustermann commented 3 years ago

To clarify this a bit more, I think we want a solution where:

These are the properties our current natives have and we probably want to have the same properties for ffi natives as well.

mkustermann commented 3 years ago

Some notes from offline discussions:

ghost commented 3 years ago

@mkustermann, @mraleph Do we have any proposals for acquiring the corresponding native function pointer (i.e. without native runtime entry)? Presumably this would involve the 'one mechanism to lookup symbols'.

One approach (which I'm stealing from Daco) is to add something like LoadNativePtrInstr which would do a lookup akin to NativeEntry::ResolveNative in NativeCallInstr. This would then be responsible for loading the native function pointer, to enable the call to the FFI Trampoline.

One question I have though is whether we have the native function pointers at compile (IL -> asm.) time, or whether we need to do the lookup at runtime?

mraleph commented 3 years ago

Do we have any proposals for acquiring the corresponding native function pointer (i.e. without native runtime entry)?

I have not been deeply involved in the discussions - so I am not sure if there are any strong reasons to avoid native runtime entries to begin with, e.g. why not simply have:

Pointer<Void> _resolve(String name) native "Ffi_resolve";
mkustermann commented 3 years ago

Do we have any proposals for acquiring the corresponding native function pointer (i.e. without native runtime entry)?

I have not been deeply involved in the discussions - so I am not sure if there are any strong reasons to avoid native runtime entries to begin with, e.g. why not simply have:

+1 To Slava's answer. It would be one mechanism to resolve any native function (1 <-> N relationship).

We could later on, go one step further and make symbol resolution faster as well as avoid relying on natives (which we may want to replace entirely in the future) by "injecting" a C function pointer into a global dart field - the dart code could then call this C function directly via FFI (avoiding usage of natives):

// Pointer to a `void* Lookup(Dart_Handle symbol_to_lookup)` function, which
// Dart code can call to resolve a native name.
@pragma('vm:entry-point')
Pointer<NativeFunction<Pointer<Void> Function(Dart_Handle)>> _resolver;

One question I have though is whether we have the native function pointers at compile (IL -> asm.) time, or whether we need to do the lookup at runtime?

In JIT we could resolve the native function at compile time, though in AOT mode we cannot rely on this.

ghost commented 3 years ago

Thank you for clarifying. My previous understanding was that natives (even for the resolve function) was off the table, so this was very helpful.

I've put together a PoC in 170092, for a naive kernel transform into calls to a single native resolve function. The numbers from some quick benchmarking are however not very encouraging with a runtime of ~460% of the baseline. Though a performance hit is entirely expected since the CL only adds overhead to do the native resolve and then the ffi call.

Since the bottleneck is the native call to 'resolve' on every ffi native function call, we could amortise that away by caching the resulting function pointer, but I think doing so would likely require a static for every ffi native function. I'd like to double check that (as I believe has previously been mentioned) is also not an option we're willing to accept?

In the meantime I'll have a look at the ffi-based resolver.

ghost commented 3 years ago

Since the bottleneck is the native call to 'resolve' on every ffi native function call, we could amortise that away by caching the resulting function pointer, but I think doing so would likely require a static for every ffi native function. I'd like to double check that (as I believe has previously been mentioned) is also not an option we're willing to accept?

From off-thread follow-up: We're ok with caching the function pointer in fields for every ffi native function as a first step. Though the long-term goal naturally is to get rid of this overhead through other mechanisms.

ghost commented 3 years ago

Implementing a cached _resolver as mentioned above appears to roughly halve the overhead, though that it still more than 2x the baseline. I'll start work on modifying the transform to cache the individual function pointers.