dart-archive / ffi

Utilities for working with Foreign Function Interface (FFI) code
https://pub.dev/packages/ffi
BSD 3-Clause "New" or "Revised" License
155 stars 32 forks source link

Is `Utf8` equivalent to `ffi.Char`? #201

Closed Levi-Lesches closed 1 year ago

Levi-Lesches commented 1 year ago

The docs for Utf8 say

This pointer is the equivalent of a char pointer (const char*) in C code. And the docs for Char say The C char type

So... can Utf8 be made a subtype of Char? Currently, if I have a C function that expects a char*, running ffigen results in

void function( ffi.Pointer<ffi.Char> args)

So using this means I have to do

void main() => function("Hello, World!".toNativeUtf8().cast<Char>());

However, if Utf8 and Char are both chars, does casting even make a difference? It seems like I should be able to use a Utf8 whenever a Char is needed. In any event, casting is almost always required when using ffigen, so this should probably be built-in to .toNativeUtf8. Am I reading this right?

dcharkes commented 1 year ago

Is Utf8 equivalent to ffi.Char? Can Utf8 be made a subtype of Char?

Char is an AbiSpecificInt while Utf8 is Opaque. One cannot access individual bytes on a Pointer<Utf8>.

casting is almost always required when using ffigen

By default ffigen indeed generates Char. https://github.com/dart-lang/ffigen#how-does-ffigen-handle-c-strings However, you should be able to use the type-map to make it generate Utf8 instead. https://github.com/dart-lang/native/issues/498

Levi-Lesches commented 1 year ago

Char is an AbiSpecificInt while Utf8 is Opaque. One cannot access individual bytes on a Pointer.

Sorry, I still don't get the practical difference after translation. If Char maps to the same thing as char on all platforms, and Utf8 maps to char as well, why is it/should it be Opaque? Why can't/shouldn't a user be able to access individual bytes on a Pointer<Utf8> if they can just .cast<Char>() and do the same? Is the cast not a no-op? What is the point of Utf8 instead of using Char in all cases (and similarly, Utf16/WChar)?

Looking at the docs from https://github.com/dart-lang/ffigen#how-does-ffigen-handle-c-strings:

To convert these to/from String, you can use package:ffi. Use ptr.cast<Utf8>().toDartString() to convert char* to dart String and "str".toNativeUtf8() to convert String to char*.

From the initial .cast<Utf8>() to claiming that .toNativeUtf8() produces a char* (which it doesn't, at least not a Pointer<Char>), the docs seem to think that Char and Utf8 are equivalent. So maybe this is more of a docs issue.

However, you should be able to use the type-map to make it generate Utf8 instead. https://github.com/dart-lang/native/issues/498

Would there be any other pros and cons to doing so except better interop with .toNativeUtf8()? Maybe these can be documented as well?

dcharkes commented 1 year ago

Sorry, I still don't get the practical difference after translation. If Char maps to the same thing as char on all platforms, and Utf8 maps to char as well, why is it/should it be Opaque?

It doesn't map to char*.

UTF-8 is a variable-length character encoding standard used for electronic communication.

https://en.wikipedia.org/wiki/UTF-8

The un-decoded "code units" could can be read as bytes. But the nth character in utf8 will not be at the nth byte in the code units array, because every character has 1-4 bytes as length.

There is no indexed access to the nth character in a utf8 string, you'll have to read all the preceding bytes to know at which byte you can read the bytes that make the nth character. Therefore, it makes more sense to have Pointer<Utf8> as an opaque. You have to convert the whole string to a Dart string before you can do something useful.

From the initial .cast<Utf8>() to claiming that .toNativeUtf8() produces a char* (which it doesn't, at least not a Pointer<Char>), the docs seem to think that Char and Utf8 are equivalent. So maybe this is more of a docs issue.

The docs seem outdated indeed. (Probably from before we had a Char in dart:ffi.) I've filed https://github.com/dart-lang/native/issues/442.

However, you should be able to use the type-map to make it generate Utf8 instead. dart-lang/native#498 (comment)

Would there be any other pros and cons to doing so except better interop with .toNativeUtf8()?

Cons could be that it doesn't work if you have char instead of only char* somewhere in your C headers, that will break.

Levi-Lesches commented 1 year ago

But the nth character in utf8 will not be at the nth byte in the code units array, because every character has 1-4 bytes as length.

Got it, I was misunderstanding the translation there. Indeed looking at the code for .toNativeUtf8 and .toDartString have a bit more logic than just allocating and casting -- there's also encoding/decoding.

The docs seem outdated indeed. (Probably from before we had a Char in dart:ffi.) I've filed dart-lang/native#442.

Thanks. The main point of this issue was confusion around why ffigen doesn't work out-of-the-box with .toNativeUtf8 so I was confused, but if ffigen recommends or defaults to Utf8, that makes sense.

Thanks for your patience!