Closed Levi-Lesches closed 1 year ago
Is Utf8 equivalent to ffi.Char? Can
Utf8
be made a subtype ofChar
?
Char
is an AbiSpecificInt
while Utf8
is Opaque
. One cannot access individual bytes on a Pointer<Utf8>
.
casting is almost always required when using
ffigen
By default ffigen indeed generates Char
. https://github.com/dart-lang/ffigen#how-does-ffigen-handle-c-strings
However, you should be able to use the type-map
to make it generate Utf8
instead. https://github.com/dart-lang/native/issues/498
Char is an AbiSpecificInt while Utf8 is Opaque. One cannot access individual bytes on a Pointer
.
Sorry, I still don't get the practical difference after translation. If Char
maps to the same thing as char
on all platforms, and Utf8
maps to char
as well, why is it/should it be Opaque
? Why can't/shouldn't a user be able to access individual bytes on a Pointer<Utf8>
if they can just .cast<Char>()
and do the same? Is the cast not a no-op? What is the point of Utf8
instead of using Char
in all cases (and similarly, Utf16
/WChar
)?
Looking at the docs from https://github.com/dart-lang/ffigen#how-does-ffigen-handle-c-strings:
To convert these to/from
String
, you can use package:ffi. Useptr.cast<Utf8>().toDartString()
to convertchar*
to dartString
and"str".toNativeUtf8()
to convertString
tochar*
.
From the initial .cast<Utf8>()
to claiming that .toNativeUtf8()
produces a char*
(which it doesn't, at least not a Pointer<Char>
), the docs seem to think that Char
and Utf8
are equivalent. So maybe this is more of a docs issue.
However, you should be able to use the type-map to make it generate Utf8 instead. https://github.com/dart-lang/native/issues/498
Would there be any other pros and cons to doing so except better interop with .toNativeUtf8()
? Maybe these can be documented as well?
Sorry, I still don't get the practical difference after translation. If
Char
maps to the same thing aschar
on all platforms, andUtf8
maps tochar
as well, why is it/should it beOpaque
?
It doesn't map to char*
.
UTF-8 is a variable-length character encoding standard used for electronic communication.
https://en.wikipedia.org/wiki/UTF-8
The un-decoded "code units" could can be read as bytes. But the nth character in utf8 will not be at the nth byte in the code units array, because every character has 1-4 bytes as length.
There is no indexed access to the nth character in a utf8 string, you'll have to read all the preceding bytes to know at which byte you can read the bytes that make the nth character. Therefore, it makes more sense to have Pointer<Utf8>
as an opaque. You have to convert the whole string to a Dart string before you can do something useful.
From the initial
.cast<Utf8>()
to claiming that.toNativeUtf8()
produces achar*
(which it doesn't, at least not aPointer<Char>
), the docs seem to think thatChar
andUtf8
are equivalent. So maybe this is more of a docs issue.
The docs seem outdated indeed. (Probably from before we had a Char
in dart:ffi
.) I've filed https://github.com/dart-lang/native/issues/442.
However, you should be able to use the type-map to make it generate Utf8 instead. dart-lang/native#498 (comment)
Would there be any other pros and cons to doing so except better interop with
.toNativeUtf8()
?
Cons could be that it doesn't work if you have char
instead of only char*
somewhere in your C headers, that will break.
But the nth character in utf8 will not be at the nth byte in the code units array, because every character has 1-4 bytes as length.
Got it, I was misunderstanding the translation there. Indeed looking at the code for .toNativeUtf8
and .toDartString
have a bit more logic than just allocating and casting -- there's also encoding/decoding.
The docs seem outdated indeed. (Probably from before we had a
Char
indart:ffi
.) I've filed dart-lang/native#442.
Thanks. The main point of this issue was confusion around why ffigen
doesn't work out-of-the-box with .toNativeUtf8
so I was confused, but if ffigen recommends or defaults to Utf8
, that makes sense.
Thanks for your patience!
The docs for
Utf8
saySo... can
Utf8
be made a subtype ofChar
? Currently, if I have a C function that expects achar*
, runningffigen
results inSo using this means I have to do
However, if
Utf8
andChar
are bothchar
s, does casting even make a difference? It seems like I should be able to use aUtf8
whenever aChar
is needed. In any event, casting is almost always required when usingffigen
, so this should probably be built-in to.toNativeUtf8
. Am I reading this right?