Closed twinstar6980 closed 1 year ago
I think it may be that non-ascii bytes have an effect on the operation of ftxui, and in some of my other tests, entering non-ascii text may cause control strings to appear in the interface or even crash the program
I tested this program on the ubuntu terminal and successfully entered non-ascii characters, then this should be a bug on the Windows platform.
BTW, I don't know if this is the reason: in the terminal under Windows, even if the codepage is set to utf-8 (using SetConsoleCP and SetOutputConsoleCP), you can't get the normal utf-8 text through std::cin, you must use ReadConsole api
The following is an example of how I can get a line of UTF-8 text from a Windows terminal :
auto input (
) -> std::string {
#if defined M_system_windows
auto text_16 = std::array<char16_t, 0x1000>{};
auto length = DWORD{};
auto state = ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), text_16.data(), static_cast<DWORD>(text_16.size()), &length, nullptr);
assert_condition(state);
auto text_8 = utf16_to_utf8(std::u16string_view{text_16.data(), length - 2});
return std::string{std::move(reinterpret_cast<std::string &>(text_8))};
#endif
#if defined M_system_linux || defined M_system_macos || defined M_system_android || defined M_system_ios
auto text = std::string{};
std::getline(std::cin, text);
return text;
#endif
}
Hello!
Sorry, support for Microsoft's terminal is not as good as the one in regular terminal. I don't really have an easy access to a computer with Microsoft's Windows OS, so it is kind of hard for me. Experimental support was kindly added by @mauve some time ago.
My understanding of the issue is that we do an incorrect cast here: https://github.com/ArthurSonzogni/FTXUI/blob/master/src/ftxui/component/screen_interactive.cpp#L106
The parser itself takes UTF8 encoded char. So, I guess we need to convert the Microsoft's WCHAR into a sequence of byte to feed the parser.
Would you like to submit a PR to fix this? I believe you can try to replace this line by:
for(auto byte : utf8_conv.to_bytes(key_event.uChar.UnicodeChar)) {
parser.Add(byte);
}
With utf8_conv
defined outside of the 2 loop by:
std::wstring_convert<std::codecvt_utf8<wchar_t>> utf8_conv;
umm.. I did what you said and now I can enter non-ascii characters! However, I found that I still couldn't receive the characters of non-BMP correctly (I tried to enter the characters 𐀀(\U10000) and got utf8 bytes of \uD800 and utf8 bytes of \uDC00
windows terminal's encoding is utf-16, not ucs-2. looks like special handling should be done for surrogates characters in unicode?
Okay, indeed. So it seems we need to convert from utf16 to utf8 instead.
I tried parsing utf16 in EventListener and seems to have successfully entered the non-BMP character, but not sure if there will be any side effects
#ifdef _WIN32
#ifdef _MSC_VER
#pragma warning(push)
#pragma warning(disable : 4996) // codecvt_utf8_utf16 is deprecated
#endif
constexpr auto utf18_surrogate_low_0 = static_cast<char16_t>(0b111111'0000000000);
constexpr auto utf18_surrogate_low_1 = static_cast<char16_t>(0b000000'1111111111);
auto utf8_ucs2_conv = std::wstring_convert<std::codecvt_utf8<char16_t>, char16_t>{};
auto utf8_ucs4_conv = std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t>{};
auto utf16_character_of_first_surrogate = char16_t{0};
#endif
for (const auto& r : records) {
switch (r.EventType) {
case KEY_EVENT: {
auto key_event = r.Event.KeyEvent;
// ignore UP key events
if (key_event.bKeyDown == FALSE)
continue;
#ifdef _WIN32
// msvc error C3850 if use \unnnn instead of \xnnnn(cpp23 only)
auto utf16_character = reinterpret_cast<char16_t &>(key_event.uChar.UnicodeChar);
if ((utf16_character & utf18_surrogate_low_0) == u'\xD800') {
if (utf16_character_of_first_surrogate != u'\x0000') {
throw std::runtime_error{"invalid utf-16 character"};
}
utf16_character_of_first_surrogate = utf16_character;
} else if ((utf16_character & utf18_surrogate_low_0) == u'\xDC00') {
if (utf16_character_of_first_surrogate == u'\x0000') {
throw std::runtime_error{"invalid utf-16 character"};
}
auto unicode_character = 0x10000 + (static_cast<char32_t>(utf16_character_of_first_surrogate & utf18_surrogate_low_1) << 10) | (utf16_character & utf18_surrogate_low_1);
for (auto byte : utf8_ucs4_conv.to_bytes(unicode_character)) {
parser.Add(byte);
}
utf16_character_of_first_surrogate = u'\x0000';
} else {
if (utf16_character_of_first_surrogate != u'\x0000') {
throw std::runtime_error{"invalid utf-16 character"};
}
for (auto byte : utf8_ucs2_conv.to_bytes(utf16_character)) {
parser.Add(byte);
}
}
#else
parser.Add((char)key_event.uChar.UnicodeChar);
#endif
} break;
case WINDOW_BUFFER_SIZE_EVENT:
out->Send(Event::Special({0}));
break;
case MENU_EVENT:
case FOCUS_EVENT:
case MOUSE_EVENT:
// TODO(mauve): Implement later.
break;
}
}
#ifdef _WIN32
#ifdef _MSC_VER
#pragma warning(pop)
#endif
#endif
Pretty nice!
This is something we should merge at some point. I need to find some windows environments to improve upon what you prototyped.
At the following address you can find VMs with preinstalled development tools, these you can also use for testing of course.
https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/
I made a fix: https://github.com/ArthurSonzogni/FTXUI/pull/538
This ended up easier, because I previously had to implement UTF16 to UTF8 conversion myself for another reason.
I don't know if this is expected behavior or bug, FTXUI supports UTF-8 output, but it seems to only support ASCII input?. I tried to enter non-ascii characters (e.g. CJK characters, full-width commas, emoji) in the input component, but none of them entered correctly into the input component, and I also tried copying a non-ascii text from the outside and then copying it in the input component, but still could not enter it correctly.
https://user-images.githubusercontent.com/37923060/194156167-12c998f4-316b-46e1-9418-27336d29bcca.mp4