PerlFFI / FFI-Platypus

Write Perl bindings to non-Perl libraries with FFI. No XS required.
89 stars 23 forks source link

Add wide string type plugin #299

Closed plicease closed 3 years ago

plicease commented 3 years ago

Add support for wide strings via a new plugin. In the C standard this is const wchar_t * and wchar_t * where wchar_t are usually either 2 or 4 bytes. In Windows this is UCS-2 later evolved to be UTF-16LE.

Also add some hooks in the Win32 plugins to add what they refer to as wide strings: LPWSTR and LPCWSTR but are really just aliases on the wchar_t strings used by the C standard. I think this form of wide string is common enough in the Win32 API that it makes sense to just always load these types.

Arguably this should be added to the (default) C language plugin as well, but I think in practice these types are rarely enough used in non-Win32 APIs that it isn't worth the overhead.

This is heavily based on #292 but is generalized to the C standard rather than the Windows API. This form also doesn't require mucking about with pointers for the read/write buffer case. The read/write buffer case is a little awkward in my opinion, but I think it is less error-prone and less verbose than using pointers. For cases where you really do want to muck about with pointers (I think there probably are cases) you can still do this using wcs* functions provided by libc.

plicease commented 3 years ago
plicease commented 3 years ago
plicease commented 3 years ago
plicease commented 3 years ago
* [ ]  CI started failing with [2f1f01e](https://github.com/PerlFFI/FFI-Platypus/commit/2f1f01e022baa25ec8177bde6048aacd0e0be2ec) but I can't see what it has to do with the failure!

Turns out I was looking at the wrong diagnostic. It does have to do with the wide string after all. Adding a fix to the test and a note in the CAVEATS section. e2e4a18631ecef45d0e7993484b3466f497f54bb

plicease commented 3 years ago

Here is the full rendered POD page for ::Lang::Win32, still working on completing the ::Type::WideString doco.

NAME

FFI::Platypus::Lang::Win32 - Documentation and tools for using Platypus with the Windows API

VERSION

version 1.34

SYNOPSIS

use utf8;
use FFI::Platypus 1.35;

my $ffi = FFI::Platypus->new(
  api  => 1,
  lib  => [undef],
);

# load this plugin
$ffi->lang('Win32');

# Pass two double word integer values to the Windows API Beep function.
$ffi->attach( Beep => ['DWORD','DWORD'] => 'BOOL');
Beep(262, 300);

# Send a Unicode string to the Windows API MessageBoxW function.
use constant MB_OK                   => 0x00000000;
use constant MB_DEFAULT_DESKTOP_ONLY => 0x00020000;
$ffi->attach( [MessageBoxW => 'MessageBox'] => [ 'HWND', 'LPCWSTR', 'LPCWSTR', 'UINT'] => 'int' );
MessageBox(undef, "I ❤️ Platypus", "Confession", MB_OK|MB_DEFAULT_DESKTOP_ONLY);

# Get a Unicode string from the Windows API GetCurrentDirectoryW function.
$ffi->attach( [GetCurrentDirectoryW => 'GetCurrentDirectory'] => ['DWORD', 'LPWSTR'] => 'DWORD');
my $buf_size = GetCurrentDirectory(0,undef);
my $dir = "\0\0" x $buf_size;
GetCurrentDirectory($buf_size, \$dir) or die $^E;
print "$dir\n";

DESCRIPTION

This module provides the Windows datatypes used by the Windows API. This means that you can use things like DWORD as an alias for uint32. The full list of type aliases is not documented here as it may change over time or be dynamic. You can get the list for your current environment with this one-liner:

perl -MFFI::Platypus::Lang::Win32 -E "say for sort keys %{ FFI::Platypus::Lang::Win32->native_type_map }"

This plugin will also set the correct ABI for use with Win32 API functions. (On 32 bit systems a different ABI is used for Win32 API than what is used by the C library, on 32 bit systems the same ABI is used). Most of the time this exactly what you want, but if you need to use functions that are using the standard C calling convention, but need the Win32 types, you can do that by setting the ABI back immediately after loading the language plugin:

$ffi->lang('Win32');
$ffi->abi('default_abi');

Most of the types should be pretty self-explanatory or at least provided in the Microsoft documentation on the internet, but the use of Unicode strings probably requires some more detail:

[version 1.35]

This plugin also provides LPCWSTR and LPWSTR "wide" string types which are implemented using FFI::Platypus::Type::WideString. For full details, please see the documentation for that module, and note that LPCWSTR is a wide string in the read-only string mode and LPWSTR is a wide string in the read-write buffer mode.

The LPCWSTR is handled fairly transparently by the plugin, but for when using read-write buffers (LPWSTR) with the Win32 API you typically need to allocate a buffer string of the right size. These examples will use GetCurrentDirectoryW attached as GetCurrentDirectory as in the synopsis above. These are illustrative only, you would normally want to use the Cwd module to get the current working directory.

METHODS

abi

my $abi = FFI::Platypus::Lang::Win32->abi;

This is called internally when the type plugin is loaded by Platypus. It selects the appropriate ABI to make Win32 API function calls.

native_type_map

my $hashref = FFI::Platypus::Lang::Win32->native_type_map;

This is called internally when the type plugin is loaded by Platypus. It provides types aliases useful on the Windows platform, so it may also be useful for introspection.

This returns a hash reference containing the native aliases for the Windows API. That is the keys are native Windows API C types and the values are libffi native types.

This will includes types like DWORD and HWND, and others. The full list may be adjusted over time and may be computed dynamically. To get the full list for your install you can use this one-liner:

perl -MFFI::Platypus::Lang::Win32 -E "say for sort keys %{ FFI::Platypus::Lang::Win32->native_type_map }"

load_custom_types

FFI::Platypus::Lang::Win32->load_custom_types($ffi);

This is called internally when the type plugin is loaded by Platypus. It provides custom types useful on the Windows platform. For now that means the LPWSTR and LPCWSTR types.

CAVEATS

The Win32 API isn't a different computer language in the same sense that the other language plugins (those for Fortran or Rust for example). But implementing these types as a language plugin is the most convenient way to do it.

Prior to version 1.35, this plugin didn't provide an implementation for LPWSTR or LPCWSTR, so in the likely event that you need those types make sure you also require at least that version of Platypus.

SEE ALSO

plicease commented 3 years ago

Here is the final version of the ::Type::WideString documentation

NAME

FFI::Platypus::Type::WideString - Platypus custom type for Unicode "wide" strings

VERSION

version 1.34

SYNOPSIS

use FFI::Platypus 1.00;

my $ffi = FFI::Platypus->new( api => 1, lib => [undef] );
$ffi->load_custom_type('::WideString' => 'wstring', access => 'read' );
$ffi->load_custom_type('::WideString' => 'wstring_w', access => 'write' );

# call function that takes a constant wide string
$ffi->attach( wcscmp => ['wstring', 'wstring'] => 'int' );
my $diff = wcscmp("I ❤ perl + Platypus", "I ❤ perl + Platypus"); # returns 0

# call a function that takes a wide string for writing
$ffi->attach( wcscpy => ['wstring_w', 'wstring'] );
my $buf;
wcscpy(\$buf, "I ❤ perl + Platypus");
print $buf, "\n";  # prints "I ❤ perl + Platypus"

# call a function that takes a wide string for modification
$ffi->attach( wcscat => ['wstring_w', 'wstring'] );
my $buf;
wcscat( [ \$buf, "I ❤ perl" ], " + Platypus");
print $buf, "\n";  # prints "I ❤ perl + Platypus"

On Windows use with LPCWSTR:

use FFI::Platypus 1.00;

my $ffi = FFI::Platypus->new( api => 1, lib => [undef] );

# define some custom Win32 Types
# to get these automatically see FFI::Platypus::Lang::Win32
$ffi->load_custom_type('::WideString' => 'LPCWSTR', access => 'read' );
$ffi->type('opaque' => 'HWND');
$ffi->type('uint'   => 'UINT');

use constant MB_OK                   => 0x00000000;
use constant MB_DEFAULT_DESKTOP_ONLY => 0x00020000;

$ffi->attach( [MessageBoxW => 'MessageBox'] => [ 'HWND', 'LPCWSTR', 'LPCWSTR', 'UINT'] => 'int' );

MessageBox(undef, "I ❤️ Platypus", "Confession", MB_OK|MB_DEFAULT_DESKTOP_ONLY);

DESCRIPTION

This custom type plugin for FFI::Platypus provides support for the native "wide" string type on your platform, if it is available.

Wide strings are made of up wide characters (wchar_t, also known as WCHAR on Windows) and have enough bits to represent character sets that require larger than the traditional one byte char.

These strings are most commonly used on Windows where they are referred to as LPWSTR and LPCWSTR (The former for read/write buffers and the latter for const read-only strings), where they are encoded as UTF-16LE.

They are also supported by libc on many modern Unix systems where they are usually UTF-32 of the native byte-order of the system. APIs on Unix systems more commonly use UTF-8 which provides some compatibility with ASCII, but you may occasionally find APIs that talk in wide strings. (libarchive, for example, can work in both).

This plugin will detect the native wide string format for you and transparently convert Perl strings, which are typically encoded internally as UTF-8. If for some reason it cannot detect the correct encoding, or if your platform is currently supported, an exception will be thrown (please open a ticket if this is the case). It can be used either for read/write buffers, for const read-only strings, and for return values. It supports these options:

Options:

read-only

Read-only strings are the easiest of all, are converted to the native wide string format in a buffer and are freed after that function call completes.

$ffi->load_custom_type('::WideString' => 'wstring' );
$ffi->function( wprintf => [ 'wstring' ] => [ 'wstring' ] => 'int' )
     ->call("I %s perl + Platypus", "❤");

This is the mode that you want to use when you are calling a function that takes a const wchar_t* or a LPCWSTR.

return value

For return values the access and size options are ignored. The string is simply copied into a Perl native string.

$ffi->load_custom_type('::WideString' => 'wstring' );
# see note below in CAVEATS about wcsdup
my $str = $ffi->function( wcsdup => [ 'wstring' ] => 'wstring' )
              ->call("I ❤ perl + Platypus");

This is the mode that you want to use when you are calling a function that returns a const wchar_t*, wchar_t, LPWSTR or LPCWSTR.

read/write

Read/write strings can be passed in one of two ways. Which you choose depends on if you want to initialize the read/write buffer or not.

This is the mode that you want to use when you are calling a function that takes a <wchar_t*> or a LPWSTR.

CAVEATS

As with the Platypus built in string type, return values are copied into a Perl scalar. This is usually what you want anyway, but some APIs expect the caller to take responsibility for freeing the pointer to the wide string that it returns. For example, wcsdup works in this way. The workaround is to return an opaque pointer, cast it from a wide string and free the pointer.

use FFI::Platypus::Memory qw( free );
$ffi->load_custom_type('::WideString' => 'wstring' );
my $ptr = $ffi->function( wcsdup => [ 'wstring' ] => 'opaque' )
              ->call("I ❤ perl + Platypus");
my $str = $ffi->cast('opaque', 'wstring', $ptr);
free $ptr;

Because of the order in which objects are freed you cannot return a wide string if it is also a wide string argument to a function. For example wcscpy may crash if you specify the return value as a wide string:

# wchar_t *wcscpy(wchar_t *dest, const wchar_t *src);
$ffi->attach( wcscpy => [ 'wstring_w', 'wstring' ] => 'wstring' ); # no
my $str;
wcscpy( \$str, "I ❤ perl + Platypus");  # may crash on memory error

This is because the order in which things are done here are 1. $str is allocated 2. $str is re-encoded as utf and the old buffer is freed 3. the return value is computed based on the $str buffer that was freed.

If you look at wcscpy though you don't actually need the return value. To make this code work, you can just ignore the return value:

$ffi->attach( wcscpy => [ 'wstring_w', 'wstring' ] => 'void' ); # yes
my $str;
wcscpy( \$str, "I ❤ perl + Platypus"); # good!

On the other hand you do care about the return value from wcschr, which returns a pointer to the first occurrence of a character in an argument string:

# wchar_t *wcschr(const wchar_t *wcs, wchar_t wc);
$ffi->attach( wcschr => [ 'wstring', 'wchar_t' ] => 'wstring' ); # no
# this may crash on memory error or return the wrong value
my $str = wcschr("I ❤ perl + Platypus", ord("❤"));

Instead you need to work with pointers and casts to use this function:

use FFI::Platypus 1.00;
use FFI::Platypus::Memory qw( free );

my $ffi = FFI::Platypus->new( api => 1, lib => [undef] );

$ffi->attach( wcsdup => ['wstring'] => 'opaque' );
$ffi->attach( strchr => [ opaque', 'wchar_t' ] => 'wstring' );

# create a wcs string in memory using wcsdup
my $haystack = wcsdup("I ❤ perl + Platypus");
# find the heart and return as a wide string
my $needle = strchr($haystack, ord("❤"));
# safe to free the pointer to the larger string now
free $haystack;

SEE ALSO