Open qmuntal opened 2 years ago
CC @alexbrainman
What if we just unconditionally change it to default to utf16? Or perhaps always require either -utf16
or -ascii
, but not none. I don't think Go promises any stability w.r.t. autogenerators, right?
The reason I suggest this is that I'd hate for the behavior of mkwinsyscall to become, "it does the old bad thing unless you remember to pass this special switch." I'd rather it be "it does the new good thing, unless you tell it to do the old bad thing" or "you have to tell it which thing you want to do, so that it won't do the old bad thing without you telling it to."
I suspect changing it to default to utf16 would make most sense, considering most if not all functions are already wchar.
@qmuntal
This forces me to define the //sys functions using
*uint16
instead ofstring
and to do the conversion outside the autogenerated code.
I don't see using *uint16
instead of string
as a bad thing. Users of your package would have to explicitly call windows.UTF16PtrFromString
, but they could save some memory allocations because of that. I don't know if memory saving is important when using BCrypt library, but I always try not to hide memory allocations when creating low level libraries.
I suggest you copy x/sys/windows/mkwinsyscall
into your package and adjust it as required (probably 1 or 2 lines change), and use it for your case. If you discover more use cases for your proposal, I am happy to reconsider.
@zx2c4
What if we just unconditionally change it to default to utf16?
That will break some users code. I suggested a reasonable workaround for @qmuntal problem. Why breaking users code is better than my workaround?
Or perhaps always require either
-utf16
or-ascii
, but not none
That will definitely break users code. And your proposed flags will force current users split their code into 2 different source files: ascii version and utf16 version.
I am not convinced this is a problem that we need to solve. Yet.
But feel free to ignore my opinion.
Alex
@alexbrainman
I don't see using *uint16 instead of string as a bad thing. Users of your package would have to explicitly call windows.UTF16PtrFromString, but they could save some memory allocations because of that.
windows.UTF16PtrFromString returns a pointer to the UTF16 string and an error, so we are forcing users to check that error before calling the autogenerated function, whereas if the autogenerated function would do the conversion this boilerplate code could be eliminated.
I don't know if memory saving is important when using BCrypt library, but I always try not to hide memory allocations when creating low level libraries.
I agree with your statement about not hiding allocations, but this proposal don't preclude //sys
definitions to use *uint16
instead of string
, it's just about making string
params to be handled correctly regardless of the function naming convention. IMO we are not adding a new way to hide allocations, as functions accepting string
params whose name end with W
or A
are already hiding allocations.
Having low memory allocation is primordial when using BCrypt library, but for my use case functions accepting UTF16 strings are not supposed to be called multiple times with the same string, so it makes no sense to cache the windows.UTF16PtrFromString result in order to reduce allocations. Passing the string directly to the autogenerated function would not have any performance impact, but would be more ergonomic.
I suggest you copy x/sys/windows/mkwinsyscall into your package and adjust it as required (probably 1 or 2 lines change), and use it for your case. If you discover more use cases for your proposal, I am happy to reconsider.
If you think this proposal covers a niche use case, I'm fine patching mkwinsyscall. But, as @zx2c4 has also mentioned, UTF16 strings are the standard for new Windows APIs, and it would be a pity to lose the capacity of passing string
arguments instead of *uint16
to new Windows function.
whereas if the autogenerated function would do the conversion this boilerplate code could be eliminated.
Yes. If you don't care about allocations, then you can save your users from writing boilerplate code.
this proposal don't preclude
//sys
definitions to use*uint16
instead ofstring
, it's just about makingstring
params to be handled correctly regardless of the function naming convention. IMO we are not adding a new way to hide allocations, as functions acceptingstring
params whose name end withW
orA
are already hiding allocations.
Agree.
Having low memory allocation is primordial when using BCrypt library, but for my use case functions accepting UTF16 strings are not supposed to be called multiple times with the same string, so it makes no sense to cache the windows.UTF16PtrFromString result in order to reduce allocations. Passing the string directly to the autogenerated function would not have any performance impact, but would be more ergonomic.
Thanks for explaining.
If you think this proposal covers a niche use case, ...
I do think that. Maybe my opinion will change in the future.
... I'm fine patching mkwinsyscall. But, as @zx2c4 has also mentioned, UTF16 strings are the standard for new Windows APIs, and it would be a pity to lose the capacity of passing
string
arguments instead of*uint16
to new Windows function.
You would have to copy mkwinsyscall program and make small changes to it. If you find this troublesome, I am happy to reconsider. I did exactly that in many of my personal projects. I do not regret making that decision after many years.
I also happy to reconsider if other projects find this functionality useful.
I am just trying to defer breaking users code till later.
Alex
Change https://go.dev/cl/425054 mentions this issue: syscall: rely on utf16.AppendRune
Background
Microsoft normally provides two parallel sets of APIs, one for ANSI strings, whose function names end with
A
, and the other for Unicode strings, whose function names end withW
(source).This pattern is well supported in
x/sys/windows/mkwinsyscall
: function parameters of typestring
are transformed usingsyscall.UTF16PtrFromString
if the function names ends withW
, and usingsyscall.BytePtrFromString
otherwise:https://github.com/golang/sys/blob/2edf467146b5fc89e484991587e3032c8421ae8c/windows/mkwinsyscall/mkwinsyscall.go#L616-L622
Problem
Newer libraries that were created after Windows introduced Unicode tend to provide just the UTF16 API set without the
W
-ending convention. In my particular case, I'm creating bindings for the BCrypt library, in which all functions accepting strings expect them to be UTF16 encoded even when the function name does not end withW
.This forces me to define the //sys functions using
*uint16
instead ofstring
and to do the conversion outside the autogenerated code.Request
Add a knob to
x/sys/windows/mkwinsyscall
to decide if a string parameter should be transformed to ASCII or Unicode.Two complementary solutions come to mind:
Add a cli flag, i.e.
-utf16
, which changes the code generation behavior from everything is ASCII unless the function name ends with W to everything is UTF16 unless the function name ends with A.Add a new optional condition to each function definition, i.e.
[encoding=utf16/ascii]
, which will override the encoding selected from the function name or the cli flag (if finally added).The first solution will probably be enough to cover all cases, as new Windows APIs are Unicode by default and only legacy APIs expect ASCII strings, in which case they already provide the both API sets.
cc @zx2c4