[typed_data] Add `Float16List` + [vm/ffi] Add `Float16`

rainyl commented 2 months ago

Currently, only Float (Float32) and Double (Float64) are introduced in dart:ffi. However, the application of Float16 is becoming more and more widespread, especially for AI-releated computations, and it is very inconvenient when interacting to native libraries that supports Float16, developers have to access the fp16 pointers or values using Uint16 and write the convension menthods by themselves, even so, some methods like Uint8List.view() are not possible for fp16 if developers want to return a float16 view instead of a copy.

I have read #52250 and #51994, but both of them are talking about more specific primitive types for dart lang, however this issue just for dart:ffi.

dart-github-bot commented 2 months ago

Summary: This issue proposes adding Float16 support to dart:ffi to enable direct interaction with native libraries that use Float16, improving efficiency and convenience for developers working with AI-related computations.

lrhn commented 2 months ago

It's also technically possible to support a Float16List in dart:typed_data, but if there are no operations to convert single 16-bit floats to double (a very quick check suggests that to be the case at least for intel/AMD CPUs), reading and writing would likely not be as efficient as expected. (A "clever" implementation may convert a number of values at a time, and cache the results, so consecutive reading can be optimized. Writing is harder.)

rainyl commented 2 months ago

What about just make float16List as an alias of Uint16List, which means a float16List is actually stored as Uint16List, but convert to dart double when getting values and convert to Uint16 when setting values?

dcharkes commented 2 months ago

but if there are no operations to convert single 16-bit floats to double (a very quick check suggests that to be the case at least for intel/AMD CPUs), reading and writing would likely not be as efficient as expected.

I did find this:

I'm not sure if using machine instructions from float16->float32->float64 is slower or faster than going from float16->float64 manually.

If users need this and are going to manually use slow conversions to doubles anyway, then we might as well make their lives easier and add it in dart:ffi and dart:typed_data.

@rainyl do you want to use the float16's as doubles in Dart? or are your use cases only about efficiently shuffling bytes around?

What about just make float16List as an alias of Uint16List, which means a float16List is actually stored as Uint16List, but convert to dart double when getting values and convert to Uint16 when setting values?

Any XXXList is stored as bytes and only converted when reading/writing values! 😄

If you just want to shuffle bytes around, you don't want to do it via something that requires conversions when reading writing. So you'd want to use setRange on a TypedData with another TypedData that has the same element type, so that it can be a memcpy.

If we add Half as a valid float NativeType, then we also need to implement it in FFI calls. It looks like for not all ABIs this is well-defined:

for arm you need an extension https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html
And RISC-V does not have support for scalars yet https://github.com/riscv/riscv-v-spec/issues/349

So it might be tricky to fully add Half everywhere in dart:ffi. (Though I'd definitely be open to someone trying.)

cc @mraleph @mkustermann @rmacnak-google

rainyl commented 2 months ago

do you want to use the float16's as doubles in Dart? or are your use cases only about efficiently shuffling bytes around?

Both, the ideal use cases are very similar to other native types, but the most important for my project now is creating a view of Float16List and providing a proper way to get/set values. Now I can regard Uint16List as Float16List, but it's not elegant if users want to get/set values.

// Currently I interact with native float16 using:
final ffi.Pointer<ffi.Uint16> ptr = ...;
final Uint16List view = ptr.asTypedList(length);
// Without Float16List, users have to set/get values via:
final double val = fp16_int_to_double(view[0]);
view[0] = fp16_double_to_int(val);
// `fp16_int_to_double` and `fp16_double_to_int` are implemented referring to https://github.com/opencv/opencv/blob/71d3237a093b60a27601c20e9ee6c3e52154e8b1/modules/core/include/opencv2/core/cvdef.h#L828-L917

// It will be user-friendly if users can set/get using dart double directly, maybe some thing like:
final ffi.Pointer<ffi.Float16> ptr = ...;
final Float16List view = ptr.asTypedList(length);
// With Float16List, users can set/get values via:
final double val = view[0];
view[0] = val;

Any XXXList is stored as bytes and only converted when reading/writing values! 😄

Sounds like easy to implement the above operations for Float16List, good news.

If we add Half as a valid float NativeType, then we also need to implement it in FFI calls. It looks like for not all ABIs this is well-defined:

Yes, but maybe the implementation of opencv can be a reference? It defined a hfloat and use it's own implementation if __fp16 is not defined, otherwise use __fp16 https://github.com/opencv/opencv/blob/71d3237a093b60a27601c20e9ee6c3e52154e8b1/modules/core/include/opencv2/core/cvdef.h#L384-L399

dcharkes commented 2 months ago

final ffi.Pointer<ffi.Float16> ptr = ...;
final Float16List view = ptr.asTypedList(length);
// With Float16List, users can set/get values via:
final double val = view[0];
view[0] = val;

Happy to receive a PR for this!

Should it be Half or Float16? We call the other thing Double. 😄

A PR for only this should add errors on Halfs in FFI calls and callbacks.

Yes, but maybe the implementation of opencv can be a reference? It defined a hfloat and use it's own implementation if __fp16 is not defined, otherwise use __fp16

ushort

hehe so it's an uint16 if it's not available.

Well, Dart is not compiled at the same time as your library that uses open-cv, so we risk compiling with different flags which will lead to segfaults. On the other hand, we also assume SoftFP on Android arm32 and hard fp on arm32 Linux. Technically there can be Androids out there with hardfp and linuxes with softfp, but we've not run into them.

I'd be fine simply assuming the type is defined (except for risc-v).

I'm also open for getting a PR for adding this. This PR will be much more involved, as it includes getting the calling conventions right.

If you want to work on these PRs I can provide pointers for where to start.

Wdestroier commented 2 months ago

I would like to suggest Float16 instead of Half.

rainyl commented 2 months ago

Should it be Half or Float16? We call the other thing Double. 😄

Same as @Wdestroier , I like Float16 too.

If you want to work on these PRs I can provide pointers for where to start.

Sure, I am willing to work on this when having some free time, so could you please provide some instructions? So that other developers can work on this too. 😄

dcharkes commented 2 months ago

@sigmundch @mkustermann can dart:typed_data Float16List be properly supported on dart2js and dart2wasm? (We can of course always fall back to an implementation that does the bit-shuffling in Dart, but that might not be desirable for performance reasons.)

For adding Float16List:

Float16List can be added in sdk/lib/typed_data/typed_data.dart
runtime/vm/class_id.h needs to get an entry in CLASS_LIST_TYPED_DATA
The implementation can be added in sdk/lib/_internal/vm/lib/typed_data_patch.dart
- We'll need to add the 16 bit variant of external double _getFloat32(int offsetInBytes); and the setter.
- An optimized implementation that targets machine code instructions for float16 conversions will need to be an external function in Dart that is recognized in the Dart compiler.
- runtime/vm/compiler/recognized_methods_list.h TypedList_GetFloat32
- runtime/vm/compiler/frontend/kernel_to_il.cc that's where these recognized methods are implemented via FlowGraphBuilder::BuildTypedListGet
  - This will need to generate IL that will generate machine code in the files such as runtime/vm/compiler/backend/il_x64.cc with EmitNativeCode.
- A non-optimized implementation will instead of having an external function in the patch file, just load 16 bits via a Uint16List and do the conversion in Dart. This is probably an easier implementation to start with and will perform considerably worse. But it's maybe worth doing that as the first PR before diving in to generating machine code.
A benchmark can be added in benchmarks/TypedData/

For adding support for Pointer<Float16>, Array<Float16> and Float16s in structs/unions; and error messages on using Float16 in FFI calls and callbacks:

The type should be added in sdk/lib/ffi/native_type.dart
The public API as extension types in sdk/lib/ffi/ffi.dart
- Most of the API is generated by runtime/tools/ffi/sdk_lib_ffi_generator.dart
The implementation in CFE transforms pkg/vm/lib/modular/transformations/ffi/*.dart
- pkg/vm/lib/modular/transformations/ffi/common.dart in all the consts
- pkg/vm/lib/modular/transformations/ffi/native_type_cfe.dart to reason about size (2 bytes) and alignment (?) inside structs
DItto in the backend to reason about size and alignment runtime/vm/compiler/ffi/native_type.cc
Error messages for the analyzer are added in pkg/analyzer/lib/src/generated/ffi_verifier.dart
Error messages for the CFE are added in pkg/vm/lib/modular/transformations/ffi/use_sites.dart
Tests
- For the error messages for analyzer and CFE: Add a file next to tests/ffi/static_checks/vmspecific_static_checks_test.dart in the style of that file
- For positive tests using Float16 inside structs add a float16 type to tests/ffi/generator/c_types.dart and some tests with structs to tests/ffi/generator/structs_by_value_tests_configuration.dart -> try to especially cover cases where alignment could be unexpected. For example a struct with first a uint8_t and then a float16.
- For positive tests using asTypedList you can add a new test in tests/ffi/.

If the rest of the Dart team is in favor of adding this, my suggestion would be to split this work up in multiple PRs:

Adding an unoptimized Float16List
Optimizing the Float16List get and set with recognized methods that target assembly instructions for float16 conversions
Adding support for Pointer<Float16> and Float16 inside structs (but rejecting Float16 as FFI call/callback arguments and return value)
Adding support for Float16 as FFI call/callback arguments and return value. (I can provide pointers on how to do that later.)

lrhn commented 2 months ago

I don't think a Float16List can be efficient in JavaScript, pendant not in Warm either of us not a built-in type. And even on native, the smallest x64 operation performs four parallel conversations, not just a single one. That means bit-shuffling in JS and Wasm, possibly on native too. Reading is fairly simple, it's one sign bit, 5 bit exponent, 10 bit mantissa. A 64 entry lookup table for the exponent + sign will probably work. Writing worries me more. Bit-fiddling on doubles requires first getting the bits of the double, which Dart doesn't support directly. Then it needs some rounding rules. The input is bigger than for reading, so a table isn't useful. Native will almost certainly use the SIMD operation for each value. Everybody else will have to do something more expensive.

I'm not sure bad support is better than no support.

(We'll probably also want a Float16x8 type and list of those.)

dcharkes commented 2 months ago

(We'll probably also want a Float16x8 type and list of those.)

➕ I was thinking about that too.

mkustermann commented 2 months ago

AI/ML models can use different 16-bit floating point number formats, most commonly IEEE float16 and bfloat16 (which has more exponent bits). I

So if the reason is AI/ML it would make sense to extend the discussion to be => dart:ffi: Pointer<Float16> & Pointer<BFloat16> => dart:typed_data: Float16List & BFloat16List

Those two are somewhat separate and can be discussed separately (e.g. we support Pointer<Bool> in dart:ffi without having an equivalent list type in dart:typed_data).

For dart:typed_data it may be tricky as JavaScript doesn't have equivalent typed arrays and dart2js would dynamically need to keep track of the type (which may be problematic, see e.g. recent deprecation & removal of UnmodifiableUint8List/... classes). @rakudrama wdyt?

For dart:ffi we'd need to think to what extend we want to support it: Allowing it indirectly via Pointer with appropriate double operator [](int index) void operator[]=(int index, double value) is probably the most common use and uncontroversial. Though allowing them as Struct members or primtivies is more tricky as we'd need to have ABI support and it's not part of standard C and it seems some ABIs may not support it. => The only real use may? be via Pointer<> usage, so we could restrict it's usage to that => Our compiler would then generate very efficient code for the conversion to/from double

But if the only use is via Pointer, we have to think whether it's actually needed to have this support as part of dart:ffi. Let's say we model this as extension types in a helper package (e.g. in package:ffi/bfloat16.dart):

import 'dart:ffi';

extension type BFloat16P(Pointer<Uint16> pointer) {
  double operator [](int index) {
    final int value = pointer.value;
    // ... code to bfloat16->double ... (XXX)
    return convertedValue;
  }

  void operator []=(int index, double value) {
    // ... code to double->bfloat16 ... (XXX)
    return convertedValue;
  }

  BFloat16List asTypedList(int length) => BFloat16List(this, length);
}

class BFloat16List implements List<double> {
  final BFloat16P pointer;
  final int length;

  BFloat16List(this.pointer, this.length);

  double operator [](int index) => pointer[index];

  void setRange(...) {
    // Would e.g. delegate to already optimized `pointer.asTypedList().setRange()`
  }
}

And then users can use it via

@Native<Pointer<Uint16> Function()>()
external Pointer<Uint16> getTensor();

main() {
  final BFloat16List tensor = BFloat16P(getTensor()).asTypedList(64);
  for (int i = 0; i < tensor.length; ++i) {
    print(tensor[i]);
  }
}

or if we allow convenience usage of extension types in FFI:

@Native<BFloat16P Function()>()
external BFloat16P getTensor();

main() {
  final BFloat16List tensor = getTensor().asTypedList(64);
  for (int i = 0; i < tensor.length; ++i) {
    print(tensor[i]);
  }
}

We could ensure the conversion code in (XXX) is written in a way that allows our compilers to generate very efficient code for it (possibly even recognizing the specific conversion pattern & optimizing via built-in HW support).

@rainyl Would your use case be solved by this?

dcharkes commented 2 months ago

or if we allow convenience usage of extension types in FFI:

👍 Tracked in:

https://github.com/dart-lang/sdk/issues/54944

=> Our compiler would then generate very efficient code for the conversion to/from double

It would be even more efficient with Float16x8. But so far we've been only doing that for Float32x4 via typed_data. So if we wanted to allow that and not add it to typed data we should maybe consider having such Float16x8 in dart:ffi? (But I guess no support for BFloat16x8, I haven't seen any assembly instructions tailored to that yet.)

I'd be cautious adding Float16Pointer as an extension type in for example package:ffi if we would consider adding Float16x8 later in the Dart SDK. Moving types between a package and dart: libs is next to impossible.

rainyl commented 2 months ago

Would your use case be solved by this?

Yes, I am working on opencv bindings for dart, so I have to get a view of the pixel values at (x, y) to read and change the values, I believe your method will work.

rmacnak-google commented 2 months ago

@dcharkes RISC-V has a ratified extension, Zfh, but the major Linux distributions don't include it in their baseline. AFAIK, Android and Fuchsia haven't chosen their baseline yet, but Zfhmin is part of the RVA22 profile, so I expect they will include it.

dart-lang / sdk

[typed_data] Add `Float16List` + [vm/ffi] Add `Float16` #56319