ArthurSonzogni / FTXUI

:computer: C++ Functional Terminal User Interface. :heart:
MIT License
7k stars 425 forks source link

can't input non-ascii character on Input component #495

Closed twinstar6980 closed 1 year ago

twinstar6980 commented 2 years ago

I don't know if this is expected behavior or bug, FTXUI supports UTF-8 output, but it seems to only support ASCII input?. I tried to enter non-ascii characters (e.g. CJK characters, full-width commas, emoji) in the input component, but none of them entered correctly into the input component, and I also tried copying a non-ascii text from the outside and then copying it in the input component, but still could not enter it correctly.

https://user-images.githubusercontent.com/37923060/194156167-12c998f4-316b-46e1-9418-27336d29bcca.mp4

twinstar6980 commented 2 years ago

I think it may be that non-ascii bytes have an effect on the operation of ftxui, and in some of my other tests, entering non-ascii text may cause control strings to appear in the interface or even crash the program

twinstar6980 commented 2 years ago

I tested this program on the ubuntu terminal and successfully entered non-ascii characters, then this should be a bug on the Windows platform.

BTW, I don't know if this is the reason: in the terminal under Windows, even if the codepage is set to utf-8 (using SetConsoleCP and SetOutputConsoleCP), you can't get the normal utf-8 text through std::cin, you must use ReadConsole api

The following is an example of how I can get a line of UTF-8 text from a Windows terminal :

auto input (
) -> std::string {
    #if defined M_system_windows
    auto text_16 = std::array<char16_t, 0x1000>{};
    auto length = DWORD{};
    auto state = ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), text_16.data(), static_cast<DWORD>(text_16.size()), &length, nullptr);
    assert_condition(state);
    auto text_8 = utf16_to_utf8(std::u16string_view{text_16.data(), length - 2});
    return std::string{std::move(reinterpret_cast<std::string &>(text_8))};
    #endif
    #if defined M_system_linux || defined M_system_macos || defined M_system_android || defined M_system_ios
    auto text = std::string{};
    std::getline(std::cin, text);
    return text;
    #endif
}
ArthurSonzogni commented 2 years ago

Hello!

Sorry, support for Microsoft's terminal is not as good as the one in regular terminal. I don't really have an easy access to a computer with Microsoft's Windows OS, so it is kind of hard for me. Experimental support was kindly added by @mauve some time ago.

My understanding of the issue is that we do an incorrect cast here: https://github.com/ArthurSonzogni/FTXUI/blob/master/src/ftxui/component/screen_interactive.cpp#L106

The parser itself takes UTF8 encoded char. So, I guess we need to convert the Microsoft's WCHAR into a sequence of byte to feed the parser.

Would you like to submit a PR to fix this? I believe you can try to replace this line by:

    for(auto byte : utf8_conv.to_bytes(key_event.uChar.UnicodeChar)) {
      parser.Add(byte);
    }

With utf8_conv defined outside of the 2 loop by:

    std::wstring_convert<std::codecvt_utf8<wchar_t>> utf8_conv;
twinstar6980 commented 2 years ago

umm.. I did what you said and now I can enter non-ascii characters! However, I found that I still couldn't receive the characters of non-BMP correctly (I tried to enter the characters 𐀀(\U10000) and got utf8 bytes of \uD800 and utf8 bytes of \uDC00

windows terminal's encoding is utf-16, not ucs-2. looks like special handling should be done for surrogates characters in unicode?

Snipaste_22-10-17_01-08-53

ArthurSonzogni commented 2 years ago

Okay, indeed. So it seems we need to convert from utf16 to utf8 instead.

twinstar6980 commented 2 years ago

I tried parsing utf16 in EventListener and seems to have successfully entered the non-BMP character, but not sure if there will be any side effects

Snipaste_22-10-17_02-08-59


#ifdef _WIN32

#ifdef _MSC_VER
#pragma warning(push)
#pragma warning(disable : 4996)  // codecvt_utf8_utf16 is deprecated
#endif

    constexpr auto utf18_surrogate_low_0 = static_cast<char16_t>(0b111111'0000000000);
    constexpr auto utf18_surrogate_low_1 = static_cast<char16_t>(0b000000'1111111111);
    auto utf8_ucs2_conv = std::wstring_convert<std::codecvt_utf8<char16_t>, char16_t>{};
    auto utf8_ucs4_conv = std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t>{};
    auto utf16_character_of_first_surrogate = char16_t{0};

#endif

    for (const auto& r : records) {
      switch (r.EventType) {
        case KEY_EVENT: {
          auto key_event = r.Event.KeyEvent;
          // ignore UP key events
          if (key_event.bKeyDown == FALSE)
            continue;
#ifdef _WIN32
          // msvc error C3850 if use \unnnn instead of \xnnnn(cpp23 only)
          auto utf16_character = reinterpret_cast<char16_t &>(key_event.uChar.UnicodeChar);
          if ((utf16_character & utf18_surrogate_low_0) == u'\xD800') {
            if (utf16_character_of_first_surrogate != u'\x0000') {
              throw std::runtime_error{"invalid utf-16 character"};
            }
            utf16_character_of_first_surrogate = utf16_character;
          } else if ((utf16_character & utf18_surrogate_low_0) == u'\xDC00') {
            if (utf16_character_of_first_surrogate == u'\x0000') {
              throw std::runtime_error{"invalid utf-16 character"};
            }
            auto unicode_character = 0x10000 + (static_cast<char32_t>(utf16_character_of_first_surrogate & utf18_surrogate_low_1) << 10) | (utf16_character & utf18_surrogate_low_1);
            for (auto byte : utf8_ucs4_conv.to_bytes(unicode_character)) {
              parser.Add(byte);
            }
            utf16_character_of_first_surrogate = u'\x0000';
          } else {
            if (utf16_character_of_first_surrogate != u'\x0000') {
              throw std::runtime_error{"invalid utf-16 character"};
            }
            for (auto byte : utf8_ucs2_conv.to_bytes(utf16_character)) {
              parser.Add(byte);
            }
          }
#else
          parser.Add((char)key_event.uChar.UnicodeChar);
#endif
        } break;
        case WINDOW_BUFFER_SIZE_EVENT:
          out->Send(Event::Special({0}));
          break;
        case MENU_EVENT:
        case FOCUS_EVENT:
        case MOUSE_EVENT:
          // TODO(mauve): Implement later.
          break;
      }
    }

#ifdef _WIN32

#ifdef _MSC_VER
#pragma warning(pop)
#endif

#endif
ArthurSonzogni commented 2 years ago

Pretty nice!

This is something we should merge at some point. I need to find some windows environments to improve upon what you prototyped.

mauve commented 2 years ago

At the following address you can find VMs with preinstalled development tools, these you can also use for testing of course.

https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/

ArthurSonzogni commented 1 year ago

I made a fix: https://github.com/ArthurSonzogni/FTXUI/pull/538

This ended up easier, because I previously had to implement UTF16 to UTF8 conversion myself for another reason.