About Support for Full width Characters in FTXUI

ArthurSonzogni / FTXUI

:computer: C++ Functional Terminal User Interface. :heart:

MIT License

6.69k stars 401 forks source link

About Support for Full width Characters in FTXUI #865

Open hcwanz opened 3 months ago

hcwanz commented 3 months ago

I noticed that there are functions in FTXUI that support wchar, such as wstring_width, but I haven't found any applications for these functions. When I tried to print "测试" in examples/dom/border.cpp, only a bunch of messy codes were printed. So I would like to ask what kind of support FTXUI has for full-width characters:

It has been supported, but my usage is incorrect?
Will it be supported in the future?
Or do users need to make some modifications to FTXUI themselves?

ArthurSonzogni commented 3 months ago

Hello!

Full width characters are supporters.

What terminal are you using? Maybe your terminal doesn't support rendering them?

See my test on examples/dom/border.cpp

You can try the example examples/component/input.cpp demo on your terminal and input some full width characters.

hcwanz commented 3 months ago

It seems like it's really my problem here

ArthurSonzogni commented 3 months ago

Maybe ;-) What OS and terminal are you using?

hcwanz commented 3 months ago

windows &&windows terminal When using _getch normally, you can input display full width characters. But in input.cpp, when entering, only garbled characters will be displayed.

hcwanz commented 3 months ago

Not all of them will output messy codes. "当饭森" will only display "森"; "埏埴" can be displayed normally, and even some subsequent characters can be displayed normally, but the “，当” cannot be displayed. The "测试" will display garbled text

hcwanz commented 3 months ago

When I output them individually using string, they can be output normally

hcwanz commented 3 months ago

Maybe ;-) What OS and terminal are you using?

It seems that the text determination in EatCodePoint is not correct in my environment. As shown in the figure below, '，' in string is '1010 0011 1010 1100', which is indeed 2 bytes, but it is different from the judgment in the function. May I ask why this is the case.

ArthurSonzogni commented 3 months ago

In UTF8, we considers ", " to be two 1-byte characters:

00101100 => ','
00100000 => ' '

Not all of them will output messy codes. "当饭森" will only display "森"; 森 "埏埴" can be displayed normally, and even some subsequent characters can be displayed normally, but the “，当” cannot be displayed.

I checked locally on Linux:

It was working correctly.

I guess I should try on Microsoft Windows terminal. Thanks for your useful input!

hcwanz commented 3 months ago

... I know there's an error. Chinese characters should occupy 3 bytes in utf8, and the default encoding for window is gbk. When Chinese characters occupy two bytes, it indicates that they are not in utf8 format.