dankamongmen / notcurses

blingful character graphics/TUI library. definitely not curses.
https://nick-black.com/dankwiki/index.php/Notcurses
Other
3.43k stars 113 forks source link

Add 'effective utf8' field to ncinput struct. #2696

Closed Spritetm closed 1 year ago

Spritetm commented 1 year ago

Proposed fix for https://github.com/dankamongmen/notcurses/issues/2695 . This code adds an utf8_eff field to the ncinput struct; it gets filled with the 'effective' keypress value, that is, the UTF8 that the keypress generates with all the meta keys taken into consideration.

Looking at the 'reader' widget, it uses the the ncinput.utf8 for this, and for non-Kitty-keyboard-disambiguation terminals, it effectively acts like this. However, the docs imply that this is non-intentional and simply a result of other terminals not returning the 'base' key, so notcurses cannot know what that was.

As such, I decided to add an extra member to the ncinput struct rather than re-using the ncinput.utf8 field. The extra advantage is that this is backwards compatible with any other program as the meaning of existing fields stays the same.

dankamongmen commented 1 year ago

so.

i'd like to just merge this straight through, trusting that your intelligent writeups and solid code implies a thorough understanding of both the problem and notcurses. i would do so for anything but an input problem. i've just been burned there too many times.

so.

if you can provide me with evidence that you tested this with the other major terminals, and ncinput still ran correctly in them for a variety of input, i'll merge it after a quick readthrough. this evidence could be nothing more than "i tested these terminals with this methodology". i don't need screenshots or anything; i trust you.

or.

you can wait for me to take a day and read back through the input code, reacquaint myself with all the crap surrounding it, and perform such testing myself. this might happen in january, but i honestly doubt it. feels more like a mid-february thing.

let me know how you'd like to move forward. i really appreciate the obvious quality here. in fact, if you wanted to talk about taking over maintainership, i'd be very interested in that =].

Spritetm commented 1 year ago

I think you have more trust in my code than it's worth, it was written as a sort of proof-of-concept to see if you had any major issues with the architecture. To illustrate my point: there's actually a blunder in the design due to me misunderstanding Kittys protocol. I'll fix and then I'll answer your main points.

Spritetm commented 1 year ago

Okay, first of all: my derp was that I thought Kitty returned the effective codepoint for a key as one or more UTF8 bytes. It does not do that; what it does is that it returns 1 to n codepoints as utf32 values; seemingly it's possible that a keypress with some meta keys pressed gives multiple codepoints. Yay for weird international keyboards I guess?

Anyway, I'm okay with having this code being committed as I'm decently sure of the low-level C implementation. I would like to run some architectural decisions by you, which also gives me a nice way of rubber-duck-reviewing this code. (For reference: I found the issue above by doing just that.)

Wrt testing: do you have any good targets wrt terminals? I tried Kitty, screen and xterm, and my testing procedure mostly consisted of a program I'm developing that effectively prints out the eff_text codepoints: I tried entering '123!@#abcABC' and in all cases it got printed out correctly. I also ran notcurses-input in the same terminals and entered the same string and it did the same thing it did before my patch (return 123!@#abcABC for xterm and screen, return 123123abcabc for kitty)

Spritetm commented 1 year ago

Also, not sure what the issue with the input automata is supposed to be? It looked like a fairly standard string matching thing, I got the needed matching rules correct the first time around.

dankamongmen commented 1 year ago

Also, not sure what the issue with the input automata is supposed to be? It looked like a fairly standard string matching thing, I got the needed matching rules correct the first time around.

i'm pleased to be talking to someone who considers string-matching automata fairly standard; they seem to blow a lot of folks' minds

Spritetm commented 1 year ago

Fyi, I also tried this on uxterm, roxterm, and wezterm; behavior is as expected. I also tried this using non-ascii-based keyboard layouts (Arabic in this case) and it seemed to... well, not break? I don't have an Arabic keyboard nor am I fluent in the language, so I can't say there aren't bugs. Still haven't found a keyboard layout that sends multiple codepoints when shifted, so I can't test that, but as the code for that is fairly trivial, I assume there's a somewhat small chance for bugs there.

If this is enough testing for you and the architecture isn't detestable, feel free to merge this.

dankamongmen commented 1 year ago

Fyi, I also tried this on uxterm, roxterm, and wezterm; behavior is as expected. I also tried this using non-ascii-based keyboard layouts (Arabic in this case) and it seemed to... well, not break? I don't have an Arabic keyboard nor am I fluent in the language, so I can't say there aren't bugs. Still haven't found a keyboard layout that sends multiple codepoints when shifted, so I can't test that, but as the code for that is fairly trivial, I assume there's a somewhat small chance for bugs there.

If this is enough testing for you and the architecture isn't detestable, feel free to merge this.

i think i shall merge it now. thanks a lot for the great work!