Build infrastructure to allow changing implementation of native calls to use FFI calls instead of runtime calls

mkustermann commented 3 years ago

Native methods in Dart are currently using our runtime call mechanism to call from Dart into C code. There is a significant performance overhead due to this: The native dart function cannot be inlined, we go indirectly through stubs, we box all arguments and make an array on the stack (in order to provide the C function a pointer to an array of arguments) ...

To avoid some of this overhead, we would like to enable calling natives using the existing FFI calling mechanism. That would allow us to pass unboxed primitives (e.g. integers/doubles) directly using C calling convention as well as dart objects via auto-wrapping in handles.

We want to prototype this on natives in Dart's core libraries and later make Flutter use it in dart:ui for faster calls into C (e.g. skia calls).

This will require adding a symbol-lookup mechanism (either declaratively as we do with natives right now, or imperatively) to avoid depending on VM/Embedder symbols to be available at runtime.

Some tasks related to the Ffi Native support:

[x] Perf: Avoid expensive handle allocation for ffi natives that pass objects
[x] Size+Perf: Avoid using static fields in ffi native
[ ] Size+Perf: Allow inlining ffi trampolines
[x] Size: Possibly always outline safepoint transitions (measure how much slower it would get)
[ ] Size: Possibly avoid StackOverflow checks in ffi trampoline (we don't do it for natives)
[ ] Maintenance: Replace InvokeMathCFunction with FfiNative
[ ] Feature: Limited simarm/simarm64 support for ffi - sufficient to migrate dart:io

/cc @mraleph

mkustermann commented 3 years ago

To clarify this a bit more, I think we want a solution where:

There is a single symbol lookup mechanism
The embedder can control how symbols are resolved
The dart code for ffi natives should preferably simple, possibly declarative

These are the properties our current natives have and we probably want to have the same properties for ffi natives as well.

mkustermann commented 3 years ago

Some notes from offline discussions:

We would like to strive to a solution where we can eventually replace our old runtime calls to avoid the need to maintain two mechanisms that achieve the same goal (calling from Dart to C) where one is most likely superior to the other.

It might be nice to have a declarative way of defining such ffi natives. One option would be for example using a @FfiNative annotation plus a kernel transformer:

// Old native in Dart
int foo(FooBar arg, double arg2) native "Native_foo";

// Old native in Kernel
@ExternalName("Native_foo")  
external int foo(FooBar arg, double arg2);

// Ffi native in Dart
@FfiNative<Int8 Function(Dart_Handle, double)>("Native_foo")
external int foo(FooBar arg, double arg2);

We would like to have one mechanism to lookup symbols - where resolution can be controlled (to some extend) by the embedder.

ghost commented 3 years ago

@mkustermann, @mraleph Do we have any proposals for acquiring the corresponding native function pointer (i.e. without native runtime entry)? Presumably this would involve the 'one mechanism to lookup symbols'.

One approach (which I'm stealing from Daco) is to add something like LoadNativePtrInstr which would do a lookup akin to NativeEntry::ResolveNative in NativeCallInstr. This would then be responsible for loading the native function pointer, to enable the call to the FFI Trampoline.

One question I have though is whether we have the native function pointers at compile (IL -> asm.) time, or whether we need to do the lookup at runtime?

mraleph commented 3 years ago

Do we have any proposals for acquiring the corresponding native function pointer (i.e. without native runtime entry)?

I have not been deeply involved in the discussions - so I am not sure if there are any strong reasons to avoid native runtime entries to begin with, e.g. why not simply have:

Pointer<Void> _resolve(String name) native "Ffi_resolve";

mkustermann commented 3 years ago

Do we have any proposals for acquiring the corresponding native function pointer (i.e. without native runtime entry)?

I have not been deeply involved in the discussions - so I am not sure if there are any strong reasons to avoid native runtime entries to begin with, e.g. why not simply have:

+1 To Slava's answer. It would be one mechanism to resolve any native function (1 <-> N relationship).

We could later on, go one step further and make symbol resolution faster as well as avoid relying on natives (which we may want to replace entirely in the future) by "injecting" a C function pointer into a global dart field - the dart code could then call this C function directly via FFI (avoiding usage of natives):

// Pointer to a `void* Lookup(Dart_Handle symbol_to_lookup)` function, which
// Dart code can call to resolve a native name.
@pragma('vm:entry-point')
Pointer<NativeFunction<Pointer<Void> Function(Dart_Handle)>> _resolver;

One question I have though is whether we have the native function pointers at compile (IL -> asm.) time, or whether we need to do the lookup at runtime?

In JIT we could resolve the native function at compile time, though in AOT mode we cannot rely on this.

ghost commented 3 years ago

Thank you for clarifying. My previous understanding was that natives (even for the resolve function) was off the table, so this was very helpful.

I've put together a PoC in 170092, for a naive kernel transform into calls to a single native resolve function. The numbers from some quick benchmarking are however not very encouraging with a runtime of ~460% of the baseline. Though a performance hit is entirely expected since the CL only adds overhead to do the native resolve and then the ffi call.

Since the bottleneck is the native call to 'resolve' on every ffi native function call, we could amortise that away by caching the resulting function pointer, but I think doing so would likely require a static for every ffi native function. I'd like to double check that (as I believe has previously been mentioned) is also not an option we're willing to accept?

In the meantime I'll have a look at the ffi-based resolver.

ghost commented 3 years ago

Since the bottleneck is the native call to 'resolve' on every ffi native function call, we could amortise that away by caching the resulting function pointer, but I think doing so would likely require a static for every ffi native function. I'd like to double check that (as I believe has previously been mentioned) is also not an option we're willing to accept?

From off-thread follow-up: We're ok with caching the function pointer in fields for every ffi native function as a first step. Though the long-term goal naturally is to get rid of this overhead through other mechanisms.

ghost commented 3 years ago

Implementing a cached _resolver as mentioned above appears to roughly halve the overhead, though that it still more than 2x the baseline. I'll start work on modifying the transform to cache the individual function pointers.

dart-lang / sdk

Build infrastructure to allow changing implementation of native calls to use FFI calls instead of runtime calls #43889