PerlFFI / FFI-Platypus

Write Perl bindings to non-Perl libraries with FFI. No XS required.
89 stars 23 forks source link

Add LPCWSTR support. #292

Closed ikegami closed 3 years ago

ikegami commented 3 years ago

Adding LCWSTR support to ::Lang::Win32

ikegami commented 3 years ago

I tested on Win and Linux. The above failures are due to problems with the script setting up the test, not the changes I made.

plicease commented 3 years ago

I tested on Win and Linux. The above failures are due to problems with the script setting up the test, not the changes I made.

293 should address the GH actions fail, due to a breaking change in GH actions recently. The FreeBSD / cirrus CI fail seems legit, though I haven't investigated yet.

plicease commented 3 years ago

I've merged #293 can you please rebase on master when you have a chance to get that fix?

ikegami commented 3 years ago

t/01_use.t is giving not ok 72 - test for FFI::Platypus::Type::Win32::LPCWSTR Not sure what that is. All the tests using the module actually work. I'm not getting on my systems. Then again, I didn't use zilla. I gotta go for a while.

ikegami commented 3 years ago

oic. It looks like t/01_use.t is a generated file, and it's expects the test file to have a certain name. Will fix when I get back

ikegami commented 3 years ago

Yeah, a file named differently than expected was the only issue.

plicease commented 3 years ago

I am going to circle back to this sometime in the new year. I've self assigned this PR for me to review so that I don't loose the thread.

plicease commented 3 years ago

This is a fantastic PR. I appreciate the attention to detail and the tests in particular. Also I apologize for the long delay in this review, the last months have not left me with a lot of spare time to work on this. I am onboard with all of the Win32 specific stuff. I am not sure about the additions to the Memory module though. The equivalent to strlen for wide strings appears to be wcslen and on Unix wchar_t is 32bit instead of 16 bit. I'd prefer the Memory module to be portable where possible, and to use the libc functions where available.

I'd like to get what works here merged, so maybe we can move the memory functions into FFI::Platypus::Memory::Win32? Someone (possibly myself) can circle back later and take a stab at writing some portable wide string functions.

ikegami commented 3 years ago

This is a fantastic PR. I appreciate the attention to detail and the tests in particular. Also I apologize for the long delay in this review, the last months have not left me with a lot of spare time to work on this. I am onboard with all of the Win32 specific stuff. I am not sure about the additions to the Memory module though. The equivalent to strlen for wide strings appears to be wcslen and on Unix wchar_t is 32bit instead of 16 bit. I'd prefer the Memory module to be portable where possible, and to use the libc functions where available.

Maybe different names would be better.

I'd like to get what works here merged, so maybe we can move the memory functions into FFI::Platypus::Memory::Win32? Someone (possibly myself) can circle back later and take a stab at writing some portable wide string functions.

I purposefully avoided making them Windows-specific. Windows isn't the only user of UTF-16. JS uses UTF-16. C# uses UTF-16. These could definitely be split off, but I don't think ::Win32 would bee a good name.

ikegami commented 3 years ago

strlenU16, strcpyU16 and strncpyU16, and ::U16 or ::UTF16?

plicease commented 3 years ago

The more I think about it, the more I'm not comfortable putting these functions into the Memory module as-is. I see your point about ::Win32 (although I don't think C# or JS are easily callable from Platypus). However, referring to these as UTF-16 functions does not sit well with me either, since for strings with non-BMP characters the "length" is in half the number of bytes. I'd like to have whatever we come up with for dealing with wide character strings works naturally with non-Windows systems. Hopefully without having different functions for each platform.

I would like to merge the LPCWSTR type plugin though so I propose we proceed with one of these solutions:

  1. Let's decouple the Memory changes from this PR entirely. LPCWSTR plugin should be usable without them, so long as it internally uses... I'm thinking lstrlenW (or maybe wcslen) from system library.
  2. same as (1) except keep your strlenW implementation, but make it private.
  3. Memory module could expose the wcslen function, which the LPCWSTR plugin could use. I'm not crazy about this function tbh, but the libc name (and implementation) for this function makes sense since the other functions in the Memory module are using the standard libc functions. It's behavior on windows seems weird to me, but it is documented.

I think I am leaning for (2) because even though it means extra C code to bundle, the plugin is still usable on non-Windows platforms. Although it won't normally be useful on non-Windows, platforms, it will make it easier to test the code from Unix. For (1) and (3) the plugin should die if it is applied on a non-Windows platform.

Either way I think for now we remove strcpyW and strncpyW, again we can revisit wide string manipulation functions in another PR. Or it might actually make more sense to put this in FFI::C, which doesn't yet have string manipulation classes, but my intent is to eventually add them there.

plicease commented 3 years ago

The ::Lang::Win32 module should explicitly document LPWSTR and LPCWSTR.

plicease commented 3 years ago

The ::Lang::Win32 module should explicitly document LPWSTR and LPCWSTR.

I'd also love to have a really good synopsis that actually calls some windows specific APIs.

ikegami commented 3 years ago

The ::Lang::Win32 module should explicitly document LPWSTR and LPCWSTR.

I kept the current style of documentation, which I frankly thought was useless. It doesn't tell the user what the module does/provides. All for changing this.

ikegami commented 3 years ago

Either way I think for now we remove strcpyW and strncpyW

One of these -- or at least a public string length function so the user can implement it themselves -- is needed to handle in-out string parameters. (See the "LPWSTR as in-out argument" test.) This is probably exceedingly rare, but do you know how many times I've been m frustrated with Win32::API for not providing some basic tools because they author probably didn't foresee their need?

plicease commented 3 years ago

I kept the current style of documentation, which I frankly thought was useless. It doesn't tell the user what the module does/provides. All for changing this.

I didn't think every variable type needs to be documented. The Win32 API has frankly a stupid number of nested typedefs and it doesn't need to be repeated here. This is new behavior for the wide string types.

One of these -- or at least a public string length function so the user can implement it themselves -- is needed to handle in-out string parameters. (See the "LPWSTR as in-out argument" test.) This is probably exceedingly rare, but do you know how many times I've been m frustrated with Win32::API for not providing some basic tools because they author probably didn't foresee their need?

That is fair, but I think we can provide an interface which isn't Win32-centric. I think FFI::C is the right place to do this.

plicease commented 3 years ago

I will look into why the Perl 5.8.x fails are happening, I don't think it has to do with this PR.

The spelling errors can be fixed by adding the offending words to author.yml.

https://travis-ci.com/github/PerlFFI/FFI-Platypus/jobs/484209990#L306

plicease commented 3 years ago
  1. Memory module could expose the wcslen function, which the LPCWSTR plugin could use. I'm not crazy about this function tbh, but the libc name (and implementation) for this function makes sense since the other functions in the Memory module are using the standard libc functions. It's behavior on windows seems weird to me, but it is documented.

So what I am thinking now is:

  1. Same as 3, but remove the windows specific stuff and use wchar_t which works on Strawberry and Linux. I think most already supported platforms should work. On platforms that do not support `wcsandwchar_t` we can disable this feature.
plicease commented 3 years ago

I think it might be possible to write a type plugin that does the grunt work of an output string. If not the type plugin API should maybe be extended so that we can.

plicease commented 3 years ago

I think it might be possible to write a type plugin that does the grunt work of an output string. If not the type plugin API should maybe be extended so that we can.

POC for wide string output as a type plugin: https://gist.github.com/plicease/1e87ba3dd3faa8a0bd872b324672839d#file-wide-string-out-pl

I think this would be easier to explain in the documentation than the nest of mallocs, frees and other function calls that you need to do manually and you no longer need wcsn?cpy. I think there are situations where you might prefer to do that, but you still can via the libc API.

Because the type plugin API allows the type plugin to take arguments it could be just one plugin, and I think I prefer that probably.

plicease commented 3 years ago

I realize that I made a lot of change requests here, and it is probably a little hard to follow because a number of these things are minor nits while others are substantial refactors, so I'm working on a branch based on your work, but with my suggested adjustments. I think it is mostly functionally complete, but I want to fill in some of the documentation, and add a few more examples. I welcome your feedback since it is based on your initial request and your work. My implementation wouldn't have been possible without the work that you had already done.

299

The main thing that it does though is that it doesn't introduce any new C code, or add any public functions to the Memory.pm module, which was my main concern. The way you pass read-write buffer strings is a tad awkward tbh, but as I mentioned earlier I think it is a little easier to explain than the more manual function approach.

Indeed there may actually be cases where the manual allocation/free is warranted, but I think you can still do that by writing your own bindings to the libc functions. I've found that the types for the string functions in Memory.pm aren't quite sensible for all applications or general use so I'm hesitant to add more string functions there.

My PR also works on Unix with the native wchar_t* types and libc functions, and while I realize wide strings are not common there, I think while we are working on wide strings we may as well have them work "everywhere" for whatever approximation that "everywhere" even makes sense.

ikegami commented 3 years ago

I could follow, but the summary is appreciated. The issue is that I have no time right now, and that might be the case for a while.

ikegami commented 3 years ago

Looks like you actually did the changes. I took a quick look. It looks like you're assuming sizeof wchar_t == sizeof WCHAR, but that's not always the case. A WCHAR is always 16 bits, while wchar_t can be as little as 8 bits, but also larger than 16 bits.

ikegami commented 3 years ago

Simple solution: Add an assert to ::Lang::Win32::load_custom_types

plicease commented 3 years ago

hmmm, I took a quick look. It looks like you're assuming sizeof wchar_t == sizeof WCHAR, but that's not always the case. A WCHAR is always 16 bits, while wchar_t can be as little as 8 bits, but also larger than 16 bits.

wchar_t could be an arbitrary size equal or larger than char in general, but on Windows it is a 16bit wide UTF-16LE the same as WCHAR.

"The wchar_t type is an implementation-defined wide character type. In the Microsoft compiler, it represents a 16-bit wide character used to store Unicode encoded as UTF-16LE, the native character type on Windows operating systems" https://docs.microsoft.com/en-us/cpp/cpp/char-wchar-t-char16-t-char32-t?view=msvc-160

WCHAR is in fact defined in terms of wchar_t if it is defined:

#if !defined(_NATIVE_WCHAR_T_DEFINED)
typedef unsigned short WCHAR;
#else
typedef wchar_t WCHAR;
#endif

https://docs.microsoft.com/en-us/windows/win32/extensible-storage-engine/wchar

These statements also hold true for the gcc compiler used by Strawberry Perl and MSYS2.

Simple solution: Add an assert to ::Lang::Win32::load_custom_types

Sorry, I am not sure to what this is referring or what you would assert.

ikegami commented 3 years ago

WCHAR is in fact defined in terms of wchar_t if it is defined:

Objection withdrawn. Misinformed/Misremembered.

Sorry, I am not sure to what this is referring or what you would assert.

That sizeof wchar_t == sizeof WCHAR. But that's always the case.

These statements also hold true for the gcc compiler used by Strawberry Perl and MSYS2.

Which is why an assert would have been sufficient.

plicease commented 3 years ago

I've merged #299 which is based on this PR and release Platypus as development version 1.35_01. Since that is a dev version we more wiggle room to make breaking changes until I cut a production release. I went ahead and cut a dev release now so that I can start to accumulate reports from cpantesters.