Wrong cursor position when CD contains non-printable characters

chrisant996 / clink

Bash's powerful command line editing in cmd.exe

https://chrisant996.github.io/clink/

GNU General Public License v3.0

3.44k stars 135 forks source link

Wrong cursor position when CD contains non-printable characters #664

Closed andry81 closed 3 weeks ago

andry81 commented 3 weeks ago

The same as this: https://github.com/chrisant996/clink/issues/64

But for the Current Directory path.

To repro:

mkdir "（test）"
cd "（test）"

version : 1.6.21.f97375 injected : clink_dll_x64.dll system : 6.3.09600.20778 codepage : 1251 keyboard langid : 1033 keyboard layout : 00000409

chrisant996 commented 3 weeks ago

You're using a font that doesn't have some of the characters you're trying to display. That's what the question mark inside the box means.

You need to either use a font that has the characters, or a terminal that can use font linking to pull missing characters from other fonts, or don't print those characters.

I copy/pasted the string from your post, and the characters show up fine on my machine in Windows Terminal:

The string you posted uses the Unicode characters U+FF08 and U+FF09, which are "Full Width Left Parenthesis" and "Full Width Right Parenthesis" in the Unicode definition.

They are defined as two cells wide in monochrome fonts. But you're using a font that can't display them. So programs will measure the characters, find that they require 2 cells, and then expect that they get drawn using 2 cells. But you're using a combination of a font and a terminal that cannot draw them, so they end up using only one cell. Programs can't predict how your terminal is configured.

This is a usage error, not a program error.

chrisant996 commented 3 weeks ago

P.S. the reason the characters work mostly ok in plain cmd.exe when Clink isn't injected is because normally cmd.exe doesn't need to reposition the cursor. But any program that needs to reposition the cursor will encounter problems with those characters when using a terminal and/or font that doesn't support the characters (e.g. bash, zsh, fish, Clink, Far and many other programs).

andry81 commented 3 weeks ago

You need to either use a font that has the characters

The font is not the matter. I am using TerminalVector

But you can try any other.

Raster Fonts	Lucida Console	Consolas

chrisant996 commented 3 weeks ago

I think you're using conhost, right? Conhost is rendering the characters as 1 cell even though they are defined as 2 cells. Windows Terminal renders them correctly.

This is a limitation of conhost. There isn't really a way to fully accurately work around the mismatch.

I may be able to invest a few days of detailed analysis to try to reverse engineer which Unicode codepoints conhost renders the wrong width, but it depends on many factors including code page. And it would be an unreliable hack which works in some cases but not others. And different terminals behave differently. So at a certain point, Clink has to call it a terminal limitation. Clink can't realistically detect and hack around different limitations in all the different terminals.

andry81 commented 3 weeks ago

And it would be an unreliable hack which works in some cases but not others.

I am not sure, but why then cmd.exe draws it correctly?

chrisant996 commented 3 weeks ago

I am not sure, but why then cmd.exe draws it correctly?

Cmd.exe prints output to stdout. Conhost.exe is what draws the output into the terminal window.

By definition, it's not drawing it correctly: the U+FF08 and U+FF09 characters are defined in Unicode as being "full width" i.e. in a terminal they're supposed to use 2 character cells.

You're using Windows 8.1, and the Windows 8.1 conhost.exe is both failing to draw the characters and failing to make them the correct width.

Here's what I see on Windows 8.1:

Here's what happens on Windows 11:

You're seeing an OS issue in Windows 8.1, which was fixed in Windows 10.

chrisant996 commented 3 weeks ago

And here is Win 8.1 using Lucida Console -- still wrong, failing to render and failing to make them the right width.

chrisant996 commented 3 weeks ago

I explained here why they appear to mostly work despite the widths being incorrect.

andry81 commented 3 weeks ago

I am not talking about characters. I am taking about cursor reposition. If there is only single cell characters, then why clink does repositioning when it should not even bother? Isn't cmd.exe already does place it as it should be?

If clink does move the cursor before the cmd.exe or instead of, then you can, for example, convert the prompt string into ANSI string and get the length. It should be equal to what the cmd.exe calculates.

chrisant996 commented 3 weeks ago

I am not talking about characters. I am taking about cursor reposition.

Yes, I know. That's very clear.

If there is only single cell characters, then why clink does repositioning when it should not even bother?

Why would you say Clink doesn't need to reposition the cursor? Of course it has to reposition the cursor, repeatedly. Just like the ReadConsoleW API has to.

Try typing abcdefg. Then press Home, and cursor repositioning is used to reposition the cursor to the beginning of the line. Then press End, and cursor repositioning is used again to reposition the cursor to the end of the line. To figure out where to position the cursor, it has to do math and measurements.

Cmd.exe calls the OS API ReadConsoleW to get a line of input from the user. ReadConsoleW and conhost.exe are built into the same version of the OS, and they share code in the OS. So, both end up using the same math and algorithm to determine the position.

Clink is not built into the OS, and can't predict which combinations of glyphs/fonts/codepages will be measured and/or rendered inaccurately by ReadConsoleW and conhost.exe. And each different terminal program behaves a little differently for certain combinations of glyphs/fonts/codepages (and even different versions of the same terminal program can have different bugs/fixes). Clink uses the Unicode spec which says what the measurements and characters should be. Clink cannot predict precisely how different versions of different terminal programs are going to actually draw glyphs, and what bugs each terminal program version might have.

I'm sorry that Clink's implementation doesn't meet your expectations. If you submit a PR of a proposal for how to redesign and rewrite it, I'll be happy to review it. A good place to start is display_readline.cpp.

Isn't cmd.exe already does place it as it should be?

Sort of. Clink could use a hack and get the cursor position at the end of the prompt, and use that instead of calculating where it should be. But any Full Width characters (such as U+FF08 and U+FF09) typed after the prompt will have the exact same kind of problems. Because they characters are not rendered correctly by conhost.exe on Windows 8.1. There is only so much Clink can reasonably do to work around limitations or bugs in specific OS versions or specific terminal programs.

You've called out performance as important to you, and I agree with that. The only way Clink could do better about predicting widths would be to never predict: instead, Clink would have to write each Unicode character one at a time, and use the GetConsoleScreenBufferInfo API to query the cursor position after each Unicode character, and remember all of the cursor positions. And whenever a character is unexpectedly too wide (which also happens), then Clink would have to erase it and detect+compensate for accidental unexpected line wrapping. That results in flicker, and sometimes in unexpected scrolling of the display. And in some terminal programs, such as ConEmu, even that brute force approach doesn't work because of bugs and limitations in the terminal program. Programs don't use that approach, because in practice it ends up being terrible -- it's very slow and it's unreliable.

If clink does move the cursor before the cmd.exe or instead of, then you can, for example, convert the prompt string into ANSI string and get the length. It should be equal to what the cmd.exe calculates.

You mean what conhost.exe calculates. Cmd.exe doesn't calculate widths and doesn't draw anything. The terminal program, such as conhost.exe, calculates widths and draws things. Read about terminal programs to understand more about the difference between a terminal program and a text mode application that is attached to a terminal.

Converting to ANSI is irrelevant. The OS and terminal are natively Unicode. This isn't about length of a string, this is about widths of glyphs and grapheme clusters used by the rendering APIs called by a terminal program. Windows 8.1 doesn't handle those correctly for some ranges of Unicode characters.

I'm sorry that my explanations aren't making sense or don't seem convincing. I'm trying to summarize the most important technical points, but I'm also leaving out a large amount of information. I'm not going to write a 20+ page technical whitepaper, which is what this topic really needs.

chrisant996 commented 3 weeks ago

P.S. If you want to talk about specific logic and algorithms in the code, we can do that. Review display_readline.cpp, and ask specific questions about the code.

andry81 commented 3 weeks ago

Clink is not built into the OS, and can't predict which combinations of glyphs/fonts/codepages will be measured and/or rendered inaccurately by ReadConsoleW and conhost.exe.

But you can read the cursor position after it.

Clink would have to write each Unicode character one at a time, and use the GetConsoleScreenBufferInfo API to query the cursor position after each Unicode character, and remember all of the cursor positions. ... That results in flicker, and sometimes in unexpected scrolling of the display.

You can do that into a hidden screen buffer to investigate the length of each character in a string.

You mean what conhost.exe calculates. Cmd.exe doesn't calculate widths and doesn't draw anything.

Than how it knows how far move the cursor?

The terminal program, such as conhost.exe, calculates widths and draws things.

I don't think so. Conhost only holds the terminal windows and its attributes like screen buffer, and transfers the ownership (fake ownership) into console process initially created (not attached) to the console window. All the rest is a task to Win32 API. You can create your own console screen buffer and directly read/write into it like does, for example, winpty.

Converting to ANSI is irrelevant. The OS and terminal are natively Unicode.

It is related because can give you the real quantity of character cells of a string in the screen buffer ignoring not drawable unicode characters, glyphs and sequences.

This isn't about length of a string, this is about widths of glyphs and grapheme clusters used by the rendering APIs called by a terminal program. Windows 8.1 doesn't handle those correctly for some ranges of Unicode characters.

But it handles them as a single character cell. Why you need to calculate a character width if it is constant? I don't get it.

chrisant996 commented 3 weeks ago

Response to the main point

This isn't about length of a string, this is about widths of glyphs and grapheme clusters used by the rendering APIs called by a terminal program. Windows 8.1 doesn't handle those correctly for some ranges of Unicode characters.

But it handles them as a single character cell. Why you need to calculate a character width if it is constant? I don't get it.

I think you're trying to say that you believe the Windows 8.1 console subsystem completely lacks support for full width characters, and if it completely lacks that then probably all full width characters will be rendered as 1 cell instead of 2 cells, and if that's true then maybe the code in wcwidth() can be changed to return 1 instead of 2 for all of them, but only on Windows 8.1 and lower.

Can you share a link to information which supports that?

Earlier I said:

"I may be able to invest a few days of detailed analysis to try to reverse engineer which Unicode codepoints conhost renders the wrong width, but it depends on many factors including code page. And it would be an unreliable hack which works in some cases but not others. And different terminals behave differently. So at a certain point, Clink has to call it a terminal limitation. Clink can't realistically detect and hack around different limitations in all the different terminals."

I might do that research at some point, but not soon.

Would you like to help by doing the research? Can you also research how it behaves in codepages such as 936? The research needs to cover all Unicode codepoints (not just U+FF08 and U+FF09).

Other responses

Clink is not built into the OS, and can't predict which combinations of glyphs/fonts/codepages will be measured and/or rendered inaccurately by ReadConsoleW and conhost.exe.

But you can read the cursor position after it.

I already said that, and I already said why that doesn't help much.

It also degrades the fix for terminal resize.

I appreciate that you want to help. Can you get familiar with the sources for display_readline.cpp and offer specific suggestions in the code?

Clink would have to write each Unicode character one at a time, and use the GetConsoleScreenBufferInfo API to query the cursor position after each Unicode character, and remember all of the cursor positions. ... That results in flicker, and sometimes in unexpected scrolling of the display.

You can do that into a hidden screen buffer to investigate the length of each character in a string.

The first sentence in the "Remarks" section of the AllocConsole documentation says:

"A process can be associated with only one console"

~To have a second hidden screen buffer, Clink would need to create another process, and use cross-process communication to pass everything back and forth between cmd.exe and the new process. It's an inefficient and unrealistic option.~

~From your note about winpty, I can see that you are indeed suggesting to use cross-process communication for all console output in Clink. That can probably compensate, but it's a large amount of work for an edge case only on Windows 8.1 and lower. It also comes at a significant performance cost. I'm not going to invest in pursuing that option.~

If you would like to try implementing a proof of concept and demonstrate its efficiency (or inefficiency), I would be happy to review your findings.

UPDATE: I forgot about CreateConsoleScreenBuffer; that would be better than using a separate process or a winpty-like technique. But it's still problematic at runtime, especially for performance. Simply using GetConsoleScreenBufferInfo between every character leads to other problems and inaccuracies, in addition to severe performance degradation.

You mean what conhost.exe calculates. Cmd.exe doesn't calculate widths and doesn't draw anything.

Than how it knows how far move the cursor?

Cmd.exe isn't what moves the cursor. Read about how the Windows console subsystem is designed.

The terminal program, such as conhost.exe, calculates widths and draws things.

I don't think so. Conhost only holds the terminal windows and its attributes like screen buffer, and transfers the ownership (fake ownership) into console process initially created (not attached) to the console window. All the rest is a task to Win32 API. You can create your own console screen buffer and directly read/write into it like does, for example, winpty.

Learn more about the console subsystem in Windows. The kernel implementation of ReadConsoleW forwards the API call into conhost.exe. Look at the architecture in src/host and src/server and e.g. ApiDispatchers::ServerReadConsole().

Converting to ANSI is irrelevant. The OS and terminal are natively Unicode.

It is related because can give you the real quantity of character cells of a string in the screen buffer ignoring not drawable unicode characters, glyphs and sequences.

Show me what you mean; show source code that implements what you're trying to describe, so that we can discuss specific technical points. Maybe I'm just misunderstanding what you mean.

chrisant996 commented 3 weeks ago

@andry81 If you want to try your idea about always treating full width (2 cell) codepoints as half width (1 cell), make a change in wcwidth.cpp, and build a local copy of Clink, and test it.

If it meets your needs, then that could be an option for you.

If it works reliably for your needs, maybe I could even add an optional Experimental mode in Clink. I wouldn't make it on by default unless it's been tested for all codepoints, and on codepages like 936 and other codepages where full width characters are common.

andry81 commented 3 weeks ago

The kernel implementation of ReadConsoleW forwards the API call into conhost.exe. Look at the architecture in src/host and src/server and e.g. ApiDispatchers::ServerReadConsole().

Not sure what you mean by "forwards", but I don't see any conhost.exe API there, only Win32 API calls.

andry81 commented 3 weeks ago

@andry81 If you want to try your idea about always treating full width (2 cell) codepoints as half width (1 cell), make a change in wcwidth.cpp, and build a local copy of Clink, and test it.

may be later

andry81 commented 3 weeks ago

The first sentence in the "Remarks" section of the AllocConsole documentation says:

"A process can be associated with only one console"

To have a second hidden screen buffer, Clink would need to create another process, and use cross-process communication to pass everything back and forth between cmd.exe and the new process. It's an inefficient and unrealistic option.

You already run clink.exe, why not create a hidden console there?

chrisant996 commented 3 weeks ago

The kernel implementation of ReadConsoleW forwards the API call into conhost.exe. Look at the architecture in src/host and src/server and e.g. ApiDispatchers::ServerReadConsole().

Not sure what you mean by "forwards", but I don't see any conhost.exe API there, only Win32 API calls.

https://devblogs.microsoft.com/commandline/windows-command-line-inside-the-windows-console/

chrisant996 commented 3 weeks ago

You already run clink.exe, why not create a hidden console there?

I already said multiple reasons why not.

I understand that you disagree and want me to invest a bunch of my time to work around this limitation in Windows 8.1.

I'm not going to use my time that way.

If you want to build a prototype and measure performance impact and make a PR, I'll be happy to review it. There's no reason to use it on Windows 10 or higher, so it would only be used on Windows 8.1 and lower.

chrisant996 commented 2 weeks ago

@andry81 I'm working on a test tool to examine widths of all Unicode codepoints. For the moment it's in a private repo, but eventually I expect to make it public.

The tool is still in an early stage, but it's already made some things very clear:

There is no way to reliably measure the visual width of a character.

Using GetConsoleScreenBufferInfo to query the cursor position after printing certain ranges of ambiguous characters is unreliable, even on Windows 11 which has significantly better terminal support than Windows 8.1 has.

Some characters are expected to be 0 width, but take up 1 cell (and in many cases get rendered as blank at the same time).
Some characters that are supposed to be combining characters are expected to be 0 width and combine with adjacent characters, but take up 1 cell and don't get combined. (I think that might be documented behavior in the Windows console layer, but I'm not sure.)
Some characters display as more than 1 cell ... but get reported as taking only 1 cell (including some that are expected to take 2 cells and render as 2 cells, but get reported as taking 1 cell).
- There are several different categories of sub-cases -- expected 2 render 1 gcsbi 1; expected 2, render 2 gcsbi 1; expected 1 render 2 gcsbi 1; expected 1 render 2 gcsbi 2; and so on.
Etc.

Also, the Unicode definitions are constantly evolving, and the expected width of certain characters can change over time. So depending on what version of the Unicode spec Clink is using + what version a terminal program is using + what version the OS is using (or different components within the OS), the results can easily disagree with each other.

But all of that usually isn't much of an issue in practice, because the problems typically happen for characters with ambiguous meaning or for emoji (especially complex sequences of emoji and ZWJ characters and selectors and so on) or in very rarely used characters (like the U+FF08 and U+FF09 that started this discussion). But of course in edge cases, these runtime disagreements may have no reasonable way to programmatically detect or resolve them. (E.g. doing hidden image processing on a bitmap of a terminal display is unreasonable, but also unreliable.)

I already knew all of that because I've experienced all of it before. Clink has already encountered issues where East Asian Ambiguous Width characters in codepage 936 (and a few others) render as one width, but GetConsoleScreenBufferInfo can imply another width, and ANSI escape codes can be processed assuming one width or the other depending on the combination of specific characters and specific terminals. I was trying to explain that, but I can understand why it can be hard to imagine without direct experience, and that was one of the motivations behind wanting to create this test tool.

The topic is very complex, and inherently has many edge cases. I maintain that it's unrealistic for Clink to attempt to dynamically analyze what the console subsystem seems to be doing, and try to infer what a terminal rendering subsystem is doing, and attempt to interpret and compensate for unexpected results. The performance for the 99.99% general cases will be impacted significantly, and the benefits in the edge cases cannot be reliable.

Further, Linux terminals and bash and etc don't necessarily fare much better. It's not hard to find cases where they have similar runtime disagreements about various characters, resulting in unexpected rendering and/or unexpected character positions. In bash it's sometimes a little harder to observe the mismatches because of the printing routines it has for optimizing minimum number of bytes printed over low speed modems, so sometimes it hopes that the cursor moved where it expected, but a user won't observe the mismatch unless the Readline library hits certain other optimized cases where it decides the most efficient technique is to use an ANSI escape code to reposition the cursor.

You can help. If you don't believe my statements, or if you want to tackle the challenge of learning and mitigating these issues, I would welcome your help in doing analysis and prototyping of different mitigation algorithms. The proposals suggested so far are overly optimistic, and in practice they do not work out as assumed. But with analysis, it might be possible to design improved proposals.

I apologize that the runtime results in Clink are undesirable, but compensating for quirks in the OS and in all the many different terminal programs isn't realistic at this time.

chrisant996 commented 2 weeks ago

Specifically: there are characters that render using 2 cells, but which GetConsoleScreenBufferInfo() implies only take 1 cell. The proposal of "get the cursor position" is unreliable and fails for some characters. For some characters it can improve the results, for other characters it fails to improve the results, and for other characters it even breaks the results.

In practice, the topic is more complex than just "get the cursor position" or "use a hidden console" (which is a subvariant of getting the cursor position).

chrisant996 commented 2 weeks ago

Here's a specific example of one of many kinds of examples where it's impossible to use GetConsoleScreenBufferInfo to measure how many cells a given codepoint or sequence of codepoints take up:

U+0300 is a Combining Grave Accent. It's meant to be combined with a preceding character to produce an alternative grapheme.

For example: Grapheme	Codepoints	Description	Cells
`e`	U+0065	Latin Small Letter E	1
`è`	U+0065, U+0300	Latin Small Letter E, Combining Grave Accent	2
`è`	U+00E8	Latin Small Letter E with Grave	1

The two ways (above) of representing a lowercase "e" with grave accent are visibly rendered the same in e.g. a browser ... except not quite, since they take up a different number of cells.

EDITED:...

It looks like the console subsystem treats the U+0065 and U+0300 as separate cells. Some terminals may choose to render the U+0065, U+0300 using a single grapheme for Latin Small Letter E With Grave. But the console screen buffer considers them two separate characters, each of which takes up 1 cell. Windows Terminal on Windows 11 renders the first cell as Latin Small Letter E With Grave ... and renders the second cell as an empty space.

The cursor is visibly in the third column, and the console APIs report that it's in the third column. This is because the console data structure design dates back to when only UCS2 existed, and the APIs simply aren't rich enough to represent certain complex concepts from Unicode, so the console makes certain trade-offs between rendering and backward compatibility.

HOWEVER...

The above has an interesting and useful implication:

From Clink's perspective, this probably means all Combining Marks can be counted as 1 cell wide, instead of the 0 cells that would be expected (and which is returned by wcwidth).

GetConsoleScreenBufferInfo and ANSI escape codes agree on their cursor position math (even though the graphical rendering is different than they claim to report). So treating the Combining Marks as 1 cell wide can likely compensate for the quirks, and result in more accurate cursor positioning.

ALSO...

There's a chance that the test program could potentially lead to some kind of system that can map out how the console subsystem represents graphemes for a given combination of OS version + Terminal program + Font + CodePage + System Locale.

Maybe it could eventually be possible for Clink to run a set of measurements in a hidden console screen buffer, and come up with more accurate measurements. Maybe it could store the results in a file that can be loaded when Clink starts, so that the measurement procedure doesn't have to be run every time Clink starts. Depending on the computer speed and how exhaustively the ranges of Unicode codepoints are tested, the measurements can take 10-60 seconds or more, so it would have to be something that's only performed when specifically instructed by the user (and also maybe in the background the first time Clink is used on a computer).

But the fact that the measurements are technically also dependent on the specific font being used is also troublesome. Because for example terminals like Windows Terminal and ConEmu render the text using a font that doesn't necessarily match the font set in the actual underlying console screen buffer. That has created inconsistencies in the past with East Asian Ambiguous Width characters. So, even such a fancy system that dynamically measures how many cells are taken by codepoints still cannot be fully reliable. The APIs simply are not rich enough.

Anyway, there's also a chance that if enough crowd-sourced testing can occur with different combinations of OS versions, terminal programs, codepages, etc -- then maybe a it could reveal certain measurements that can be made constant.

I.e. maybe this research could someday lead to a Windows-specific version of wcwidth that might be able to hard-code more accurate measurement rules.

chrisant996 commented 2 weeks ago

I've made the wcwidth-verifier repo public.

(By default it skips ideographs because they seem to have high rate of failure for column width verification, and I don't know if it's because they're not supported yet or because my system local isn't Chinese or etc.)

On Windows 8.1:

Running wcwv.exe finds 15833 codepoints that fail verification.
But running wcwv.exe --only-ucs2 --no-full-width on Windows 8.1 reports 0 failures.
- The additional flags attempt to predict how Windows 8.1 limitations will affect column width measurements.
- It looks like the limitations in Windows 8.1 are pervasive and consistent enough that just a couple of simple checks may be able to successfully compensate for its limitations.

On Windows 11:

Running wcwv.exe finds only 2057 codepoints that fail verification, and they're certain ranges of CJK codepoints.
- The majority of codepoints that fail verification render as the "missing codepoint" glyph on my computer.
- The Yijing Hexagram Symbols codepoints render as 2 cells but report as 1 cell, and wcwidth expects them to be 2 cells.
- Most of the Hangul Jamo Extended-A codepoints render as 2 cells and report as 2 cells, but wcwidth expects them to be 1 cell.

Since Windows 10 and 11 have vastly upgraded Unicode support in the console subsystem and in Windows Terminal, only a few Unicode blocks of CJK codepoints hit problems. I can't tell whether they're hitting problems because my machine doesn't use a CJK system locale, or if support is missing in the console subsystem, or what.

Future

Since the verifier tool seems to indicate that the Windows 8.1 UCS2 limitations are highly predictable, I'll make a change in an upcoming version of Clink to use alternative measurement rules on Windows 8.1.

But understand that Windows 8.1 has limitations and isn't handling the Unicode characters according to spec. There's only so much Clink can do to try to compensate for how terminals or the OS behaves.

If further issues are reported on Windows 11, I'll assess them on a case by case basis. But Windows 8.1 isn't a priority for me and will receive only minimal attention/time/support (if any). Others are welcome to contribute by doing analysis and sharing their work, and I'll be happy to review any pull requests.