Closed iomeone closed 4 years ago
I can narrow down the problem ! because in Data_Bounded ffi .
exports["topChar"] = 0x10FFFF; // unicode limit
exports["bottomChar"] = 0;
so, in this case , we should use const int i = unbox< int >(c_);
in other case , we should use const string s = unbox< string >(c_);
How can I represent 0 or three bytes as an unicode string ? (bottomChar and topChar case) Seems another utf8 issue...
Right, it is utf8 related. However, that in particular is a bug in my ffi, because I incorrectly carried over those values from the old implementation. It should be something like:
exports["topChar"] = u8"\U0010FFFF"; // unicode limit
exports["bottomChar"] = u8"\0";
You can try this, but I haven't tested this yet since I can't right now.
In general, the purescript Char
type maps to a C++ string
, with just one unicode character (which can be multibyte, per the utf8 standard).
The hard part of support unicode is you can get an complete item from a string. For example:
#pragma execution_character_set("UTF-8")
juce::String s = CharPointer_UTF8("中文test");
wcout.imbue(locale("", LC_CTYPE));
for(int i = 0 ; i < s.length(); i++)
{
wcout << wchar_t(s[i]);
cout << " int value is: " << hex << s[i] << endl;
}
it will output:
中 int value is: 4e2d
文 int value is: 6587
t int value is: 74
e int value is: 65
s int value is: 73
t int value is: 74
The point of the test is that, it can magically dectect one , two or more bytes each char ocuupy!
I use juce library, and relapce all std::string with juce::String and get the right behaviour! for example, exports["topChar"] can write as
exports["topChar"] = juce::String(CharPointer_UTF8("\xF4\x8F\xBF\xBF"));
// unicode limit utf8 \xF4\x8F\xBF\xBF is the same as 0x10FFFF with unicode encoding. (Unicode use 4 bytes to encoding).
toCharCode code can just return str[0] ! which is convenient!
// foreign import toCharCode :: Char -> Int
exports["toCharCode"] = [](const boxed& c_) -> boxed {
const juce::String& s = unbox<juce::String> (c_);
int charcode =s[0];
assert(s.length() == 1);
return charcode;
};
I still looking forward offical(your) implementing of unicode support!
the changes for
exports["topChar"] = u8"\U0010FFFF"; // unicode limit
exports["bottomChar"] = u8"\0";
went in a while back, so not sure why I didn't close this then.
Hi , sorry borther you again! I want to use purescript-unicode package and fall into a bug , and tried lots of time , but can not solve it.
my code is :
it will call an ffi named toCharCode
so I wrote the ffi
but it keeps crash again and again, I also tried
but got no lucky! crash as usual!
would you please point out what's wong with the code? thank you very much! #