Open erickguan opened 9 years ago
UCharPointer
should use u_strFromUTF8
or u_strFromUTF8WithSub
to convert to UChar string.
@tyler-nguyen Unfortunately, it's not easy to do so with ffi code. I feel like C code is required in this case.
@fantasticfears You can add it with ffi lke this:
# U_CAPI UChar* U_EXPORT2 u_strFromUTF8(UChar *dest,
# int32_t destCapacity,
# int32_t *pDestLength,
# const char *src,
# int32_t srcLength,
# UErrorCode *pErrorCode)
attach_function :u_strFromUTF8, "u_strFromUTF8#{suffix}",
[:pointer, :int32_t, :pointer, :string, :int32_t, :pointer], :pointer
For srcLength
, use bytesize
instead of ruby string.length.
Thanks, I made some snippets earlier in this way. But u_strFromUTF8
may yield its own error. That does requires extra work in Ruby end. And some bindings are purely made for UChar which made me wonder that a C binding sounds much directly as of Ruby's Code Set Indepedent model for string
UCharPointer points to an array of uint16_t which is generally not enough for a single UTF-8 character since the code point range is [0..0x10FFFF].
After
unpack U*
and write, the higher bits just vanished.unpack S*
is also not helpful at all. The array can't be packed again.