cmd/cgo: add C.WcharString

rsc commented 13 years ago

for calling routines that need a wchar_t*

rsc commented 12 years ago

Comment 1:

Labels changed: added priority-later.

rsc commented 12 years ago

Comment 2:

Labels changed: added priority-go1.

remyoudompheng commented 12 years ago

Comment 3:

should the various C.GoString, C.WString etc. move somewhere in package runtime/cgo ?
that would avoid inlining code in cgo string constants. Or is it annoying because that
would imply that some C types are predefined in runtime/cog and some others are
auto-generated?

robpike commented 12 years ago

Comment 4:

Owner changed to builder@golang.org.

rsc commented 12 years ago

Comment 6:

wchar_t is pretty rare; need not be in Go 1.

Labels changed: added priority-later, removed priority-go1.

peterGo commented 12 years ago

Comment 7:

On Windows, wchar_t is ubiquitous. Windows Unicode-enabled API functions use UTF-16
(wide character) encoding, which is used for native Unicode encoding on Windows
operating systems.
Windows Data Types for Strings
http://msdn.microsoft.com/en-us/library/windows/desktop/dd374131.aspx

rsc commented 12 years ago

Comment 8:

I would be happy to review a patch providing wchar_t in cgo,
but the Go team is not going to make it a priority for their own
Go work to write such a patch.

gopherbot commented 12 years ago

Comment 9 by Edward.Casey.Adams:

Perhaps Cgo users should link to libiconv (http://www.gnu.org/software/libiconv/)
instead?
The problem is that both the width and the unicode encoding for wchar_t is not well
defined. (See http://en.wikipedia.org/wiki/Wide_character#C.2FC.2B.2B) For example, on
Windows/Visual Studio platforms, wchar_t is 16 bits wide and encoded in UTF-16LE,
whereas most linux distros wchar_t is defined to be 32 bits wide, but most unicode is in
UTF-8 stored in regular chars and most anything else won't be little-endian. Thus adding
C.WcharString() adds ambiguity.

rsc commented 12 years ago

Comment 10:

You would only use C.WcharString on systems where you needed a wchar_t*.
The definition would be whatever that means on that system.

rsc commented 12 years ago

Comment 11:

Labels changed: added go1.1.

rsc commented 11 years ago

Comment 12:

Labels changed: removed go1.1.

rsc commented 10 years ago

Comment 13:

Labels changed: added go1.3maybe.

rsc commented 10 years ago

Comment 14:

Labels changed: added release-none, removed go1.3maybe.

rsc commented 10 years ago

Comment 15:

Labels changed: added repo-main.

GeertJohan commented 10 years ago

Comment 16:

I once made this package: https://github.com/GeertJohan/cgo.wchar
It works well, but requires libiconv. I have never tested it on anything except linux.

andlabs commented 10 years ago

Comment 17:

The problem with comment #10 is that you would either
a) need to know what the definition of wchar_t is on the target platform
b) use the mbtowc() family of functions - which requires you to know what the multibyte
encoding is
If we can guarantee that all systems supported by Go have a multibyte encoding of UTF-8,
then we can implement this portably. Alas:
$ uname -a
Linux pietro-laptop 3.13.0-29-generic #52-Ubuntu SMP Wed May 28 12:42:47 UTC 2014 x86_64
x86_64 x86_64 GNU/Linux
$ cat multibyte.c
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <string.h>
#include <errno.h>
#include <locale.h>
int main(void)
{
    wchar_t wide = L'世';
    char multibyte[MB_LEN_MAX];
    int i, n;
    setlocale(LC_ALL, "");
    errno = 0;
    n = wctomb(multibyte, wide);
    if (n == -1) {
        fprintf(stderr, "error %s\n", strerror(errno));
        return 1;
    }
    if (n == 0) {
        fprintf(stderr, "weird: wctomb() returned 0 (no bytes in output)\n");
        return 2;
    }
    for (i = 0; i < n; i++)
        printf("%02X ", multibyte[i]);
    printf("\n");
    return 0;
}
$ LC_CTYPE= ./a.out 
FFFFFFE4 FFFFFFB8 FFFFFF96 
$ LC_CTYPE=en_US.UTF8 ./a.out
FFFFFFE4 FFFFFFB8 FFFFFF96 
$ LC_CTYPE=ja_JP.SJIS ./a.out 
FFFFFF90 FFFFFFA2 
So as far as I can gather, a C.CWString() would need to be platform-specific.
For Windows, we can either
- do the work on the Go side: have unicode/utf16 do the conversion (this is what package
syscall does)
- do the work on the C side: use MultiByteToWideChar() in kernel32.dll by passing
CP_UTF8 as the first argument (which should work regardless of locale)
For the Unixes, though, I'm not sure... other than linking to libiconv, which I imagine
isn't optimal, or flat out not providing it since it isn't used much to begin with, in
which case for Windows we could just say use the routines in package syscall.
(I have wanted to prune through cgo myself sometime.)

mdempsky commented 10 years ago

Comment 18:

C99 and later specify that if __STDC_ISO_10646__ is defined, then wchar_t characters
have value equal to their Unicode code point.  We could conditionally provide/expose
C.WcharString() (or C.CWString() or whatever) only if the C compiler defines that macro,
and then I don't think we need to rely on any external libraries like libiconv.
I think the only nit would be how to handle code points greater than WCHAR_MAX.  ISO C
doesn't specify how to handle that case, but in practice it seems like encoding
characters using UTF-{8*sizeof(wchar_t)} should work.  Varying the implementation
depending on sizeof(wchar_t) might be a tad involved, but nothing really out of the
ordinary from what cgo already has to do I think.

ianlancetaylor commented 10 years ago

Comment 19:

As far as I can tell neither GCC nor clang define __STDC_ISO_10646__ so this seems
rather theoretical.

mdempsky commented 10 years ago

Comment 20:

Hm, at least GCC (4.8.2) on Ubuntu 14.04 defines it:
$ echo | gcc -E -dD - | grep STDC_ISO_10646
#define __STDC_ISO_10646__ 201103L
(Seems to come from /usr/include/stdc-predef.h, provided by glibc.)
But indeed GCC 4.6.3 on Ubuntu 12.04 or even just Clang 3.5 on Ubuntu 14.04 do not, so
that's unfortunate.

mdempsky commented 10 years ago

Comment 21:

Oh, older glibc define __STDC_ISO_10646__ in <features.h>, which then gets pulled
in by other glibc headers like <wchar.h>, but won't be provided by default or by
GCC provided headers like <stddef.h>.
But I suppose it's still not a very worthwhile signal unless Windows and OS X also
define it.

golang / go

cmd/cgo: add C.WcharString #1691