Open dcharkes opened 4 years ago
This is particularly painful for a Win32 function like SysAllocString(), since creating a BSTR
(a string type which includes a length prefix) requires two copies:
Generally speaking we cannot hand out pointers to memory inside the VM's managed / garbage collected heap, because while C code runs the GC can move objects.
In addition to that, the VM uses different string representations which might be incompatible with what the C side wants.
@timsneath Why does it require two copies? Can you not examine the Dart string to determine what the encoded length would be, then allocate a correctly sized buffer, write length + encoded string?
SysAllocString
takes an unmanaged null-terminated string and copies it to a BSTR
-formatted string. While the documentation describes the format, it also notes that they are allocated using COM memory allocation functions. While I could speculate how they are created and try and replicate that from Dart code, I can't guarantee that my implementation would be compatible with SysAllocString
, since the source code isn't available. So I have to do something like:
const MAX_STRING = 256;
final rawString = Pointer<Uint16>.allocate(count: MAX_STRING).cast<Utf16>();
rawString = Utf16.fromString('Aarhus is a beautiful city.'); // copy from Dart to unmanaged
final bstrString = SysAllocString(rawString); // Win32 makes a second copy here
... // do stuff
SysFreeString(bstrString);
free(rawString);
@timsneath If the only goal is to avoid the second memory allocation and copy, would something like this do the trick:
foo(String string) {
// Allocate BSTR without initializing it (i.e. no copy of bytes)
final bstr = SysAllocStringByteLen(nullptr, 2 * string.length).cast<Uint16>();
// Initialize the BSTR (remember "bstr" points to the actual 16-bit character buffer, not at the length prefix)
for (int i = 0; i < string.length; ++i) {
bstr[i] = string.codeUnitAt(i);
}
// <do something with "bstr">
// Free the BSTR
SysFreeString(bstr);
}
?
Yes, this works. But I think I'll need to wind up wrapping BSTR as a whole so that I can embed this kind of logic rather than expecting the package user to be aware of these subtleties.
Thanks, Martin.
On Tue, Dec 22, 2020 at 7:21 AM Martin Kustermann notifications@github.com wrote:
@timsneath https://github.com/timsneath If the only goal is to avoid the second memory allocation and copy, would something like this do the trick:
foo(String string) { // Allocate BSTR without initializing it (i.e. no copy of bytes) final bstr = SysAllocStringByteLen(nullptr, 2 * string.length).cast
(); // Initialize the BSTR (remember "bstr" points to the actual 16-bit character buffer, not at the length prefix) for (int i = 0; i < string.length; ++i) { bstr[i] = string.codeUnitAt(i); }
// <do something with "bstr">
// Free the BSTR SysFreeString(bstr); }
?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dart-lang/sdk/issues/39787#issuecomment-749594606, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARWL6YMZSYWMFHLNQWWHNDSWC2OTANCNFSM4J2O2CQQ .
In leaf-calls we could consider allowing conversion between String and Pointer. In a fashion similar to unwrapping TypedData https://github.com/dart-lang/sdk/issues/44589.
The type argument of pointer in the signature should then specify what encoding to use.
We should consider if we want to also add equivalent operations on the dart:ffi
side similar to asTypedData
so that pointers could be quickly converted to external strings.
Originally posted by @mraleph in https://github.com/dart-lang/sdk/issues/50494#issuecomment-1318645618
We could add extension methods to Pointer<Utf8>
and Pointer<Utf16>
in dart:ffi
toExternalDartString()
.
The only issue is that Utf8 and Utf16 are defined in package:ffi
instead of dart:ffi
.
Putting it in package:ffi
would rely on dart_api_dl.c
being compiled into a dylib, while putting it in dart:ffi
would enable direct calls into the API.
The only way to put it in dart:ffi
is to not extension-match on Utf8
and Utf16
but add them on Uint8
/Uint16
(but it's not that clean).
Transplanting the Utf8
and Utf16
types from package:ffi
to dart:ffi
is a horrible migration (which we saw with the AbiSpecific types earlier).
Some notes from discussion @robertbastian and follow up investigation.
Strings internally can have multiple representations (OneByteString, TwoByteString).
Completely copy-free strings are only possible if
Because strings can have multiple encodings in the runtime, we could try to make the FFI unwrap strings if all of the above hold, and otherwise allocate a temporary re-encoded string (condition 4 must still be true).
My current thinking is something like
class String {
Utf8View get utf8View => Utf8View(self);
Utf16View get utf16View => Utf16View(self);
}
// vm provided, same for utf16
abstract class Utf8View {
int get length;
}
foo(v: String) {
final vView = v.utf8View;
fooFfi(vView, vView.length);
}
static final fooFfi =
_capi<ffi.NativeFunction<Void Function(Pointer<Uint8>, ffi.Size)>>('foo')
.asFunction<void Function(Pointer<Uint8>, int)>(isLeaf: true);
Here, vView
is a UTF-8 view that gets converted to a Pointer<Uint8>
at the FFI boundary. Because it's a leaf call, we can borrow the bytes under certain conditions:
Utf8View
, if the string uses Latin-1 encoding internally, and is ASCII-only (each code point < 128)Utf16View
, if the string doesn't use Latin-1 encodingIn other cases, we have to allocate. It would be nice if the VM could take care of the allocation, and release it after the call (I'm currently using an Arena
for this in my code).
We might want to do some special casing of .length
as well, because we know it after encoding/borrowing, it doesn't need to be recalculated.
Zero-termination could also be part of this design, with a flag on UtfNView
. Borrowing would not be possible, but it would still be an ergonomic improvement wrt the temporary allocation.
Currently Utf8View
and Utf16View
can be implemented in user code with lots of copying, and converted to pointers with an explicit allocator.
Following the view idea.
The Dart type should be the view in this case, cecause the borrowing can only happen in the FFI call itself. (If it were to happen earlier and be passed around as a Pointer
the GC might move the underlying String.)
static final fooFfi =
_capi<ffi.NativeFunction<Void Function(Pointer<Uint8>, ffi.Size)>>('foo')
.asFunction<void Function(Utf8View, int)>(isLeaf: true);
We might want to do some special casing of
.length
as well, because we know it after encoding/borrowing, it doesn't need to be recalculated.
This would require some trickery in argument evaluation of FFI calls. If an argument pair vView, vView.length
occurs in an FFI call, and the view requires materialization (because of a different encoding), then the normal .length
implementation (which would traverse the string) should not be used but the length should be computed during materialization (we would need the length anyway during materialization allocation).
If the length were to be implemented as a late final
field then the materialization in the FFI call should populate it. However, that would mean vView.length
expression must be not evaluated before the actual FFI call. Which is counter to how Dart semantics are defined. E.g. vView.length
would be something like a marker in an argument position in an FFI call. But vView.length
in normal dart code would be evaluated normally.
However, we might need something more performant (which avoids copying), if this is not performant enough.
Originally posted by @dcharkes in https://github.com/dart-lang/sdk/issues/35762#issuecomment-470512718
Our null-terminated Utf8 and Utf16 string helpers in package:ffi require copying bytes from Dart to C.
We should investigate whether we can have pass strings from C to Dart without copying, and whether we can pass Utf16 strings from Dart to C without copying. The latter is unlikely though, as the Dart Garbage Collector might relocate the String.