bottlenoselabs / c2cs

Generate C# bindings from a C header.
MIT License
245 stars 18 forks source link

`wchar_t` #65

Closed lithiumtoast closed 2 years ago

lithiumtoast commented 2 years ago

Problem:

This can be configured by the compiler to force wchar_t to be 2 or 4 bytes using the -fshort-wchar flag. However, this has extreme consequences. Hardware vendors warn that all linked objects must use the same wchar_t size, including libraries. It is then not possible or at the very least unstable to link an object file compiled with -fshort-wchar, with another object file that is compiled without -fshort-wchar. It is not clear what happens when dynamic loading a library, but for NativeAOT dynamic linking is a real thing for C#.

This makes wchar_t by default not a good scenario for single-source cross-platform bindings. What's worse is that in C# strings are UTF-16 where char is 2 bytes. This means that some marshalling either by hand or otherwise has to be done to get correct behaviour for passing wchar_t strings between C and C# on Linux.

Microsoft has a discussion of introducing a UTF8String, but this would only be helpful for dealing with the interoperability of char* not wchar_t*.

Options:

  1. Enforce the use of -fshort-wchar compiler flag for all users of C2CS so that wchar_t is guaranteed to be 2 bytes. This has the consequence that users will need to re-compile their C code to be compliant.
  2. Use by hand marshalling or an ICustomMarshaler with different implementations for Windows and Linux so that one C# .cs file for bindings can be used correctly for Windows and Linux when passing wchar_t* between C# and C.
  3. Warn users of C2CS that usage of wchar_t* falls into the same category as pointers and thus different .cs files of bindings will need to be generated for each ABI. For example, a different .cs file would need to be generated for Windows and Linux where wchar_t usage is correct.
lithiumtoast commented 2 years ago

Fixed in 56cb94de2a191c9ccbf1d7299e34fc0ff22fc6d6