dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.24k stars 1.58k forks source link

[vm/ffi] Support passing `Array` to `Pointer` arguments in leaf calls #54739

Open dcharkes opened 9 months ago

dcharkes commented 9 months ago

After https://github.com/dart-lang/sdk/issues/45508, Arrays of ints and floats can be passed to Pointer arguments of FFI leaf calls.

@Native<Void Function(Pointer<Int8>)>(isLeaf: true)
external void myFunc(Pointer<Int8>);

main() {
  final Array<Int8> array = ... ;
  final Int8List typedData = array.elements;
  myFunc(typedData);
}

We could consider also accepting Array<T> where we have a Pointer<T> argument for leaf calls. We could even allow this for structs/unions.

lrhn commented 9 months ago

How would the typing work?

An Array<Int8> is presumably not a Pointer<Int8>, so the Dart type system would complain if you try to pass it as one.

There could be an asPointer member on Array which it'd be a compile time error to call except in argument position of a leaf function call. Then the user must use that. Or there can be a program transformation that inserts that coercion, but that probably needs to be integrated into type inference, since we don't necessarily know that the argument is an stay before that, and then it immediately checks assignability.

dcharkes commented 9 months ago

We have two types to work with in FFI, so the code would be as follows:

@Native<Void Function(Pointer<Int8>)>(isLeaf: true)
external void myFunc(Array<Int8>);

main() {
  final Array<Int8> array = ... ;
  myFunc(array);
}

So we would relax the type checking between the types in @Native w.r.t. the argument types in the external function. (In a similar fashion as we do with TypedData and Pointer as of https://dart-review.googlesource.com/c/sdk/+/338620.)

mkustermann commented 9 months ago

Our arrays, structs, unions are sort of value types. They are copied when passed as arguments / returned. That's also the semantics in C.

What we don't support atm (compared to C) is the address-of operator: e.g. passing a pointer to a struct via &mystruct. That's what we'd actually like to express:

@Native<Void Function(Pointer<Int8>)>(isLeaf: true)
external void myFunc(Pointer<Int8>);

main() {
  final Array<Int8> array = ...;
  myFunc(array.address);  // or addressOf(array)
}

Now the trouble with that is that .address / addressOf() would allow one to get an inner pointer to a heap object and expose it's address to Dart code, which we cannot do.

If we only allowed this functionality on declarative natives, where we see on call site the native we're calling, we could enforce statically that array.address is only accessed when passing as an argument to a leaf function and we'd rewrite it in a way to make it safe to use (no gc-triggering instruction between getting inner pointer and passing to C).

=> That would allow us to express in Dart what one can express in C, but we'd only allow to express it in places where it's safe.

It would also be a step into the direction of simplifying the annotations to:

@Native(isLeaf: true)
external void myFunc(Pointer<Int8>);

i.e. allow leaving native types away if they agree with the types in Dart signature.

There's another issue with the proposal i https://github.com/dart-lang/sdk/issues/54739#issuecomment-1912136991: A tool like package:ffigen that automatically generates C bindings will produce bindings that have Pointer<> in native type annotation as well as the Dart parameter type. It doesn't know that Dart code may want to call this with the address of an Array - as it's a property of the caller whether it passes an address of an array or a real pointer.

=> We want to be able to pass real Pointers as well as address-of arrays / structs / ....

(The downside would be that we wouldn't allow the address-of operator anywhere else, including when using the non-declarative API, as those operate on arbitrary closures.)

The next question is how to model the API, one could think of several use cases one uses in C

void takesPointer(void* p);

foo() {
  int8 array[4][4];
  takesPointer(&array[0][1]);

  struct mystruct foo;
  takesPointer(&foo.bar);

  uniontype bar;
  takesPointer(bar.x);
}

that we'd may want to be able to expression in Dart.

dcharkes commented 9 months ago

There's another issue with the proposal i #54739 (comment): A tool like package:ffigen that automatically generates C bindings will produce bindings that have Pointer<> in native type annotation as well as the Dart parameter type. It doesn't know that Dart code may want to call this with the address of an Array - as it's a property of the caller whether it passes an address of an array or a real pointer.

We have the same issue with TypedData -> Pointer, FFIgen will just generate a Pointer as Dart type. So we should then maybe consider giving that the same treatment? Probably through an extension method on typed data in dart:ffi.

All of this would be much more natural if we had a typed-data-base and array/struct/union would be extension types and expose field offset getters (https://github.com/dart-lang/sdk/issues/41237). If we introduce a .address for struct/union/array/typeddata, we should probably also support takesIntPointer(myPoint.x.address);. This requires some manual type checks, because the field returns int, and we'd

The downside would be that we wouldn't allow the address-of operator anywhere else, including when using the non-declarative API, as those operate on arbitrary closures.

I don't really like the idea of diverging the two APIs. Because we're going to end up with a feature request at some point for supporting something similar with the dynamic API. Also, the new API also can be torn-off, and passed around as closure. We only require the definitions to be static, not the call sites.

But I kind of do like the idea of .address or addressOf(...).

Some wild thoughts that don't really work out: What if we'd make Pointer implement TypedData and generate the TypeData in FFIgen for leaf calls. That doesn't work because every Pointer would based on its type argument have to implement a different TypedData. And what if we would introduce some TypedData that takes FFI types as a type argument, and make Array implement that. And again make it so FFIgen would generate the typed data for leaf calls. I think all of that makes us come back to wanting to introduce a typed-data-base with FFI types and make the existing typed-datas implement that.

mkustermann commented 9 months ago

We have the same issue with TypedData -> Pointer, FFIgen will just generate a Pointer as Dart type. So we should then maybe consider giving that the same treatment? Probably through an extension method on typed data in dart:ffi.

Seems reasonable that we'd have then an address-of operator for TypedData as well.

I don't really like the idea of diverging the two APIs. Because we're going to end up with a feature request at some point for supporting something similar with the dynamic API. Also, the new API also can be torn-off, and passed around as closure. We only require the definitions to be static, not the call sites.

Yes, it's not ideal that the declarative API has more powerful capability then the dynamic one, but we hope most people will move towards declarative API anyway.

It wouldn't compromise safety when using dynamic api or tear-offs, it just means those callers cannot use the address-of operator.

I think the issue with package:ffigen is not negliable: People don't want to hand-craft large API surfaces or writing log yaml files with exceptions. Also there may be several different call sites, some want to specify Pointer and some Array.address, so the Dart signature cannot demand Array - only in very narrow use cases where the bindings are specialized for specific call sites that happen to use Array.create() and pass it's address down.

dcharkes commented 9 months ago

We wouldn't need the operator if we had typed-data-base with a type argument:

// dart:ffi

/// 
final class PossiblyDartPointer<T extends NativeType> {

}

And then on leaf-calls we accept that one instead.

PossiblyDartPoiner should have a better name, HasAddressOf? PointerableInLeafCalls?

Then we have the type carry the fact that it would have a .address getter conceptually.

dcharkes commented 9 months ago

I think the issue with package:ffigen is not negliable: People don't want to hand-craft large API surfaces or writing log yaml files with exceptions. Also there may be several different call sites, some want to specify Pointer and some Array.address, so the Dart signature cannot demand Array - only in very narrow use cases where the bindings are specialized for specific call sites that happen to use Array.create() and pass it's address down.

Yeah agreed. Because if this we should possibly back out of https://dart-review.googlesource.com/c/sdk/+/338620.

mkustermann commented 9 months ago

Looping in @mraleph here as well.

I think it's somewhat beneficial if we keep the @Native<>() extern ... functions as close to the C signatures. Eventually we may even allow Int8 / ... in the Dart signature, making the native types often unnecessary as they coincide with Dart types. So we should avoid situations where there's an ambiguous 1<->N mapping.

We wouldn't need the operator if we had typed-data-base with a type argument:

It depends what we want the operator to support. If we want to support the C-equivalent to &mystruct.myint / &array[10] then this PossiblyDartPointer wouldn't work as the integer classes don't implement that. It's still unclear if we can make that work with the other approach though.

dcharkes commented 9 months ago

Notes from discussion with @mkustermann:

addressOf(...) vs interface class PossiblyDartPointer<T extends NativeType>.

addressOf pros:

addressOf cons:

PossiblyDartPointer pros:

PossiblyDartPointer cons:

class MyStruct extends Struct implements PossiblyDartPointer<MyStruct> {
  @Int8()
  external int x;

  @Int8()
  external int y;

  @AddressOf(#y)
  external PossiblyDartPointer<Int8> addressOfY;
}

interface class PossiblyDartPointer<T extends NativeType> {
  external int operator -(PossiblyDartPointer other); 
}

main() {
  final MyStruct = Struct.create<MyStruct>();
  final offsetOfY = myStruct.addressOfY - myStruct; // Works for TypedDatas as well.
}

This code resembles what one would do in C: &myStruct.y - &myStruct (but the addressOf call is implicit).

Maybe if we want to support the same for addressOf in struct fields we could pattern match both @Native external calls and minus expressions:

main() {
  final offset = Struct.addressOf<MyStruct>(#y) - Struct.addressOf<MyStruct>(#x); 
}

And that would work for when you don't actually have an instance of MyStruct.

The alternative approach for struct field offsets is to have

class MyStruct extends Struct {
  @Int8()
  external int x;

  @Int8()
  external int y;
}

Then if you'd want to make a pointer to a struct field if you know the struct is backed by a Pointer, you should keep the pointer around:

main() {
  Pointer<MyStruct> p = calloc();
  Pointer<Int8> = Pointer.fromAddress(p.address + Struct.offsetOf<MyStruct>(#y));
}

As proposed earlier in: https://github.com/dart-lang/sdk/issues/41237#issuecomment-1539878572

  • Custom error messages and communication for if it cannot be called in a place.

Would macros be able to help with this?

mkustermann commented 9 months ago

Had some discussions with @mraleph and we came to the conclusion that for symmetry, expressibility, performance it may be best to have

class Foo extends Struct {
  @Int8
  external int value;
}

@Native<void Function(Pointer<Foo>, Pointer<Int8>)>()
external void myNative(Pointer<Foo> fooP, Pointer<Int8> intP);

main() {
  Foo foo = Struct<Foo>.create();
  myNative(addressOf(foo), addressOf(foo, #value)); // Could get inferred type argument, restricting `foo` to be `NativeType`

  print(offsetOf<Foo>(#value));
}

We would

We have discussed (and discarded):

It's kind of weird to use that special type in the signatures now that we already have Pointer. It would only be used in leaf signatures not in normal signatures as normal calls cannot pass inner pointers to C. We'd also loose the ability to unbox Pointer to integers, as OpaquePointer can be pointer objects or typed data views, ... So overall we concluded that introducing this special OpaquePointer concept isn't worth it.

dcharkes commented 9 months ago

Cycling back to TypedData, it would be nice to use extension methods to get the type argument correctly (because we can overload via extension methods but not via top-level methods).

extension Int8ListAddressOfExtension on Int8List {
  /// ...
  ///
  /// Only callable in `@Native(isLeaf: true)` calls. 
  external Pointer<Int8> get address;
}

The same can be said to retain the type argument of Array to Pointer and to retain the struct type doing addressOf on struct.

However, the syntax would be more natural with addressOf(foo) instead of foo.address. (Why can't we introduce unary prefix operator &?!)

The .address extension type would also map well with all the existing logic around .ref, operator [] get value returning the correct types.

addressOf(foo), addressOf(foo, #value)

This also uses overloading of addressOf and doesn't work. The second one should probably be Struct.addressOf(foo, #value).

lrhn commented 9 months ago
/// Only callable in `@Native(isLeaf: true)` calls. 
 external Pointer<Int8> get address;

Need to be clear about what that means. The "most correct" and still flexible definition would be: Can only be used in tail position of arguments of calls to native leaf functions.

A leaf function must not call back into Dart, and it must not store its arguments for later use. Doing so is unsupported and unspecified behavior. While technically, we could allow passing the pointer to any function, as long as it doesn't use it after calling back into Dart or returning to Dart, but there is no way to verify that. But then, there is no way to verify that an isLeaf: true function doesn't store the pointer in a global variable.

An expression being in tail position of another expression means, informally, being a subexpression whose value will be the value of the entire expression, and that no computation happens in the expression after evaluating the tail expression, so the value is not used in any way inside the expression itself. At least if the tail expression gets evaluated at all — it can be in a branch that's not always taken.

That's basically any expression position which is not a receiver or an argument. It deliberately omits t..cascade, x = t, t as T and t as the first operand of ??, || and &&, maybe even t++ or t--, which all evaluate to the value of t, but also uses the value. Evaluation of the expression doesn't end after evaluating t.

Then it's a compile-time error to have an invocation of addressOf on a struct, union or array, an expression of the form addressOf(struct) where addressOf denotes the function from dart:ffi and struct has a static type implementing Struct, Union or Array from dart:ffi or TypedData from dart:typed_data (or whatever syntax is chosen for getting the address), in a syntactic position which is not in tail position of an argument expression to a function known to be a native tail function. (If we use .address extension getters, then it gets harder to formally specify all the many different getters which are included, but I'm sure it can be explained to users rather easily.)

We can also define tail position of elements, but that shouldn't be needed here, not unless needing to pass a list of pointers to a native function. Which should probably be an Array of pointers then._ Could we ever need to pass a struct, union or array containing a pointer, like someone wanting a struct MyArray<T> { size_t length; T* ref; } as native function argument, and then want to give them MyArray()..length = array.length..ref = addressOf(array) as argument?

That's probably allowable, we'll just have to recognize the form specifically: A pure struct creation expression is a struct creation expression StructName() with optional cascade operations that are all assignments to struct fields. An expression is in struct-tail-position of an expression if it's in tail position of the expression, or it is in tail position of a cascaded field assignment of a pure struct expression, which is itself in struct-tail-position of the original expression. And the addressOf(something) is then allowed in struct-tail-position of native tail-function call argument expressions. (But no need to go there unless needed.)

About prefix operators. You have - and ~. And you are lucky. (We could add more operators, so far Dart only has the operators needed by int. We even removed >>> from the language while int didn't need it. Adding more prefix operators can make grammar ambiguities worse, though. A(B<C,D>-E) would currently be ambiguous if not for extra explicit disambiguation rules, because - is both infix and prefix. There is nothing a user-defined &e can do that e.address cannot, both are member invocations treating e as receiver, except be brief and look like C code. Extra prefix operators have not been in as much demand as infix operators, which then come with the extra complication of precedence.)

dcharkes commented 9 months ago

I'll go ahead and start prototyping this. To be continued with comments once I run into things.

As a small side note. If we have signatures with Pointer and call sites which may be non-external typed data and other call sites which are guaranteed to be Pointer or external typed data, we would like to ensure that all the optimizations of not allocating external typed data views and and Pointers can kick for the latter. See some related discussion in https://dart-review.googlesource.com/c/sdk/+/349241/4/runtime/vm/compiler/frontend/kernel_to_il.cc#5286. Currently we compile a single function body for the trampoline, and it doesn't know if it's call sites are internal typed data or not, so I expect to have to duplicate the @Native() external function trampolines for various call sites. (Note that this duplication would have been explicit with the previous API where the user was writing Pointer in the native signature and TypedData in the Dart signature. The user would write two trampolines if they had two call sites with the one passing a Pointer and the other a TypedData.)