[vm/ffi] Support passing `Array` to `Pointer` arguments in leaf calls

After https://github.com/dart-lang/sdk/issues/45508, Arrays of ints and floats can be passed to Pointer arguments of FFI leaf calls.

@Native<Void Function(Pointer<Int8>)>(isLeaf: true)
external void myFunc(Pointer<Int8>);

main() {
  final Array<Int8> array = ... ;
  final Int8List typedData = array.elements;
  myFunc(typedData);
}

We could consider also accepting Array<T> where we have a Pointer<T> argument for leaf calls. We could even allow this for structs/unions.

How would the typing work?

An Array<Int8> is presumably not a Pointer<Int8>, so the Dart type system would complain if you try to pass it as one.

There could be an asPointer member on Array which it'd be a compile time error to call except in argument position of a leaf function call. Then the user must use that. Or there can be a program transformation that inserts that coercion, but that probably needs to be integrated into type inference, since we don't necessarily know that the argument is an stay before that, and then it immediately checks assignability.

We have two types to work with in FFI, so the code would be as follows:

@Native<Void Function(Pointer<Int8>)>(isLeaf: true)
external void myFunc(Array<Int8>);

main() {
  final Array<Int8> array = ... ;
  myFunc(array);
}

So we would relax the type checking between the types in @Native w.r.t. the argument types in the external function. (In a similar fashion as we do with TypedData and Pointer as of https://dart-review.googlesource.com/c/sdk/+/338620.)

Our arrays, structs, unions are sort of value types. They are copied when passed as arguments / returned. That's also the semantics in C.

What we don't support atm (compared to C) is the address-of operator: e.g. passing a pointer to a struct via &mystruct. That's what we'd actually like to express:

@Native<Void Function(Pointer<Int8>)>(isLeaf: true)
external void myFunc(Pointer<Int8>);

main() {
  final Array<Int8> array = ...;
  myFunc(array.address);  // or addressOf(array)
}

Now the trouble with that is that .address / addressOf() would allow one to get an inner pointer to a heap object and expose it's address to Dart code, which we cannot do.

If we only allowed this functionality on declarative natives, where we see on call site the native we're calling, we could enforce statically that array.address is only accessed when passing as an argument to a leaf function and we'd rewrite it in a way to make it safe to use (no gc-triggering instruction between getting inner pointer and passing to C).

=> That would allow us to express in Dart what one can express in C, but we'd only allow to express it in places where it's safe.

It would also be a step into the direction of simplifying the annotations to:

@Native(isLeaf: true)
external void myFunc(Pointer<Int8>);

i.e. allow leaving native types away if they agree with the types in Dart signature.

There's another issue with the proposal i https://github.com/dart-lang/sdk/issues/54739#issuecomment-1912136991: A tool like package:ffigen that automatically generates C bindings will produce bindings that have Pointer<> in native type annotation as well as the Dart parameter type. It doesn't know that Dart code may want to call this with the address of an Array - as it's a property of the caller whether it passes an address of an array or a real pointer.

=> We want to be able to pass real Pointers as well as address-of arrays / structs / ....

(The downside would be that we wouldn't allow the address-of operator anywhere else, including when using the non-declarative API, as those operate on arbitrary closures.)

The next question is how to model the API, one could think of several use cases one uses in C

void takesPointer(void* p);

foo() {
  int8 array[4][4];
  takesPointer(&array[0][1]);

  struct mystruct foo;
  takesPointer(&foo.bar);

  uniontype bar;
  takesPointer(bar.x);
}

that we'd may want to be able to expression in Dart.

There's another issue with the proposal i #54739 (comment): A tool like package:ffigen that automatically generates C bindings will produce bindings that have Pointer<> in native type annotation as well as the Dart parameter type. It doesn't know that Dart code may want to call this with the address of an Array - as it's a property of the caller whether it passes an address of an array or a real pointer.

We have the same issue with TypedData -> Pointer, FFIgen will just generate a Pointer as Dart type. So we should then maybe consider giving that the same treatment? Probably through an extension method on typed data in dart:ffi.

All of this would be much more natural if we had a typed-data-base and array/struct/union would be extension types and expose field offset getters (https://github.com/dart-lang/sdk/issues/41237). If we introduce a .address for struct/union/array/typeddata, we should probably also support takesIntPointer(myPoint.x.address);. This requires some manual type checks, because the field returns int, and we'd

The downside would be that we wouldn't allow the address-of operator anywhere else, including when using the non-declarative API, as those operate on arbitrary closures.

I don't really like the idea of diverging the two APIs. Because we're going to end up with a feature request at some point for supporting something similar with the dynamic API. Also, the new API also can be torn-off, and passed around as closure. We only require the definitions to be static, not the call sites.

But I kind of do like the idea of .address or addressOf(...).

Some wild thoughts that don't really work out: What if we'd make Pointer implement TypedData and generate the TypeData in FFIgen for leaf calls. That doesn't work because every Pointer would based on its type argument have to implement a different TypedData. And what if we would introduce some TypedData that takes FFI types as a type argument, and make Array implement that. And again make it so FFIgen would generate the typed data for leaf calls. I think all of that makes us come back to wanting to introduce a typed-data-base with FFI types and make the existing typed-datas implement that.

We have the same issue with TypedData -> Pointer, FFIgen will just generate a Pointer as Dart type. So we should then maybe consider giving that the same treatment? Probably through an extension method on typed data in dart:ffi.

Seems reasonable that we'd have then an address-of operator for TypedData as well.

I don't really like the idea of diverging the two APIs. Because we're going to end up with a feature request at some point for supporting something similar with the dynamic API. Also, the new API also can be torn-off, and passed around as closure. We only require the definitions to be static, not the call sites.

Yes, it's not ideal that the declarative API has more powerful capability then the dynamic one, but we hope most people will move towards declarative API anyway.

It wouldn't compromise safety when using dynamic api or tear-offs, it just means those callers cannot use the address-of operator.

I think the issue with package:ffigen is not negliable: People don't want to hand-craft large API surfaces or writing log yaml files with exceptions. Also there may be several different call sites, some want to specify Pointer and some Array.address, so the Dart signature cannot demand Array - only in very narrow use cases where the bindings are specialized for specific call sites that happen to use Array.create() and pass it's address down.

We wouldn't need the operator if we had typed-data-base with a type argument:

// dart:ffi

/// 
final class PossiblyDartPointer<T extends NativeType> {

}

Pointer<T> implements PossiblyDartPointer<T>
Array<T> implements PossiblyDartPointer<T>.
MyStruct extends Struct implements PossiblyDartPointer<MyStruct> (users have to write the implements to get addressOf suport)
extension methods to TypedDatas that return PossiblyDartPointer because we don't want the implements clause in dart:typed_data (would introduce some wrapper object that hopefully gets never allocated)

And then on leaf-calls we accept that one instead.

PossiblyDartPoiner should have a better name, HasAddressOf? PointerableInLeafCalls?

Then we have the type carry the fact that it would have a .address getter conceptually.

I think the issue with package:ffigen is not negliable: People don't want to hand-craft large API surfaces or writing log yaml files with exceptions. Also there may be several different call sites, some want to specify Pointer and some Array.address, so the Dart signature cannot demand Array - only in very narrow use cases where the bindings are specialized for specific call sites that happen to use Array.create() and pass it's address down.

Yeah agreed. Because if this we should possibly back out of https://dart-review.googlesource.com/c/sdk/+/338620.

Looping in @mraleph here as well.

I think it's somewhat beneficial if we keep the @Native<>() extern ... functions as close to the C signatures. Eventually we may even allow Int8 / ... in the Dart signature, making the native types often unnecessary as they coincide with Dart types. So we should avoid situations where there's an ambiguous 1<->N mapping.

We wouldn't need the operator if we had typed-data-base with a type argument:

It depends what we want the operator to support. If we want to support the C-equivalent to &mystruct.myint / &array[10] then this PossiblyDartPointer wouldn't work as the integer classes don't implement that. It's still unclear if we can make that work with the other approach though.

Notes from discussion with @mkustermann:

addressOf(...) vs interface class PossiblyDartPointer<T extends NativeType>.

addressOf pros:

It would work for addressOf(myStruct.intField)
It makes addressOf explicit (like &myStruct.intField in C)

addressOf cons:

It doesn't work for non-leaf-call contexts
- It doesn't work for the DynamicLibrary API.
- It doesn't work for calculating offsets of struct fields.
Custom error messages and communication for if it cannot be called in a place.

PossiblyDartPointer pros:

It would work for the DynamicLibrary API.
It would work for struct fields.
dart:ffi APIs accepting TypedData or Pointer can say so by a type that is automatically also implemented by Pointer.
No custom restrictions on where it can be called: Less communication need and reasoning overhead for API users.

PossiblyDartPointer cons:

It doesn't make the & operator explicit.
It's more verbose.

class MyStruct extends Struct implements PossiblyDartPointer<MyStruct> {
  @Int8()
  external int x;

  @Int8()
  external int y;

  @AddressOf(#y)
  external PossiblyDartPointer<Int8> addressOfY;
}

interface class PossiblyDartPointer<T extends NativeType> {
  external int operator -(PossiblyDartPointer other); 
}

main() {
  final MyStruct = Struct.create<MyStruct>();
  final offsetOfY = myStruct.addressOfY - myStruct; // Works for TypedDatas as well.
}

This code resembles what one would do in C: &myStruct.y - &myStruct (but the addressOf call is implicit).

Maybe if we want to support the same for addressOf in struct fields we could pattern match both @Native external calls and minus expressions:

main() {
  final offset = Struct.addressOf<MyStruct>(#y) - Struct.addressOf<MyStruct>(#x); 
}

And that would work for when you don't actually have an instance of MyStruct.

The alternative approach for struct field offsets is to have

class MyStruct extends Struct {
  @Int8()
  external int x;

  @Int8()
  external int y;
}

Then if you'd want to make a pointer to a struct field if you know the struct is backed by a Pointer, you should keep the pointer around:

main() {
  Pointer<MyStruct> p = calloc();
  Pointer<Int8> = Pointer.fromAddress(p.address + Struct.offsetOf<MyStruct>(#y));
}

As proposed earlier in: https://github.com/dart-lang/sdk/issues/41237#issuecomment-1539878572

Custom error messages and communication for if it cannot be called in a place.

Would macros be able to help with this?

Had some discussions with @mraleph and we came to the conclusion that for symmetry, expressibility, performance it may be best to have

class Foo extends Struct {
  @Int8
  external int value;
}

@Native<void Function(Pointer<Foo>, Pointer<Int8>)>()
external void myNative(Pointer<Foo> fooP, Pointer<Int8> intP);

main() {
  Foo foo = Struct<Foo>.create();
  myNative(addressOf(foo), addressOf(foo, #value)); // Could get inferred type argument, restricting `foo` to be `NativeType`

  print(offsetOf<Foo>(#value));
}

We would

give symmetry between offsetOf and addressOf
allow getting offsets / addresses of any field (also primitives)
possibly(!) allow calling addressOf() in general to allow using it outside native-leaf calls - make a runtime checks to throw if it's not a Pointer but rather e.g. typed data

We have discussed (and discarded):

making OpaquePointer class, which doesn't expose address
Pointer implements OpaquePointer
Struct would have then a OpaquePointer get address (not inherit from OpaquePointer), other fields would need artificial OpaquePointer get fooAddress getters
alaternative: we'd make addressOf return a OpaquePointer

It's kind of weird to use that special type in the signatures now that we already have Pointer. It would only be used in leaf signatures not in normal signatures as normal calls cannot pass inner pointers to C. We'd also loose the ability to unbox Pointer to integers, as OpaquePointer can be pointer objects or typed data views, ... So overall we concluded that introducing this special OpaquePointer concept isn't worth it.

Cycling back to TypedData, it would be nice to use extension methods to get the type argument correctly (because we can overload via extension methods but not via top-level methods).

extension Int8ListAddressOfExtension on Int8List {
  /// ...
  ///
  /// Only callable in `@Native(isLeaf: true)` calls. 
  external Pointer<Int8> get address;
}

The same can be said to retain the type argument of Array to Pointer and to retain the struct type doing addressOf on struct.

However, the syntax would be more natural with addressOf(foo) instead of foo.address. (Why can't we introduce unary prefix operator &?!)

The .address extension type would also map well with all the existing logic around .ref, operator [] get value returning the correct types.

addressOf(foo), addressOf(foo, #value)

This also uses overloading of addressOf and doesn't work. The second one should probably be Struct.addressOf(foo, #value).

/// Only callable in `@Native(isLeaf: true)` calls. 
 external Pointer<Int8> get address;

Need to be clear about what that means. The "most correct" and still flexible definition would be: Can only be used in tail position of arguments of calls to native leaf functions.

A leaf function must not call back into Dart, and it must not store its arguments for later use. Doing so is unsupported and unspecified behavior. While technically, we could allow passing the pointer to any function, as long as it doesn't use it after calling back into Dart or returning to Dart, but there is no way to verify that. But then, there is no way to verify that an isLeaf: true function doesn't store the pointer in a global variable.

An expression being in tail position of another expression means, informally, being a subexpression whose value will be the value of the entire expression, and that no computation happens in the expression after evaluating the tail expression, so the value is not used in any way inside the expression itself. At least if the tail expression gets evaluated at all — it can be in a branch that's not always taken.

An expression is in tail position of itself.
If an expression e is in tail position of an expression t, e is also in tail position of expressions of the forms:
- (t)
- e1 ? t : e3
- e1 ? e2 : t
- e1 ?? t
- e1 || t
- e1 && t (booleans are not relevant here, but in general).

That's basically any expression position which is not a receiver or an argument. It deliberately omits t..cascade, x = t, t as T and t as the first operand of ??, || and &&, maybe even t++ or t--, which all evaluate to the value of t, but also uses the value. Evaluation of the expression doesn't end after evaluating t.

Then it's a compile-time error to have an invocation of addressOf on a struct, union or array, an expression of the form addressOf(struct) where addressOf denotes the function from dart:ffi and struct has a static type implementing Struct, Union or Array from dart:ffi or TypedData from dart:typed_data (or whatever syntax is chosen for getting the address), in a syntactic position which is not in tail position of an argument expression to a function known to be a native tail function. (If we use .address extension getters, then it gets harder to formally specify all the many different getters which are included, but I'm sure it can be explained to users rather easily.)

We can also define tail position of elements, but that shouldn't be needed here, not unless needing to pass a list of pointers to a native function. Which should probably be an Array of pointers then._ Could we ever need to pass a struct, union or array containing a pointer, like someone wanting a struct MyArray<T> { size_t length; T* ref; } as native function argument, and then want to give them MyArray()..length = array.length..ref = addressOf(array) as argument?

That's probably allowable, we'll just have to recognize the form specifically: A pure struct creation expression is a struct creation expression StructName() with optional cascade operations that are all assignments to struct fields. An expression is in struct-tail-position of an expression if it's in tail position of the expression, or it is in tail position of a cascaded field assignment of a pure struct expression, which is itself in struct-tail-position of the original expression. And the addressOf(something) is then allowed in struct-tail-position of native tail-function call argument expressions. (But no need to go there unless needed.)

About prefix operators. You have - and ~. And you are lucky. (We could add more operators, so far Dart only has the operators needed by int. We even removed >>> from the language while int didn't need it. Adding more prefix operators can make grammar ambiguities worse, though. A(B<C,D>-E) would currently be ambiguous if not for extra explicit disambiguation rules, because - is both infix and prefix. There is nothing a user-defined &e can do that e.address cannot, both are member invocations treating e as receiver, except be brief and look like C code. Extra prefix operators have not been in as much demand as infix operators, which then come with the extra complication of precedence.)

I'll go ahead and start prototyping this. To be continued with comments once I run into things.

As a small side note. If we have signatures with Pointer and call sites which may be non-external typed data and other call sites which are guaranteed to be Pointer or external typed data, we would like to ensure that all the optimizations of not allocating external typed data views and and Pointers can kick for the latter. See some related discussion in https://dart-review.googlesource.com/c/sdk/+/349241/4/runtime/vm/compiler/frontend/kernel_to_il.cc#5286. Currently we compile a single function body for the trampoline, and it doesn't know if it's call sites are internal typed data or not, so I expect to have to duplicate the @Native() external function trampolines for various call sites. (Note that this duplication would have been explicit with the previous API where the user was writing Pointer in the native signature and TypedData in the Dart signature. The user would write two trampolines if they had two call sites with the one passing a Pointer and the other a TypedData.)

dart-lang / sdk

[vm/ffi] Support passing `Array` to `Pointer` arguments in leaf calls #54739