dankamongmen / notcurses

blingful character graphics/TUI library. definitely not curses.
https://nick-black.com/dankwiki/index.php/Notcurses
Other
3.59k stars 112 forks source link

the vast majority of our supra-BMP output on windows/msys is fubar #2117

Closed dankamongmen closed 2 years ago

dankamongmen commented 3 years ago

I was kinda hoping this would resolve itself without my intervention, but it appears not. Run some notcurses code. Note that most UTF-8 doesn't show up, or is otherwise fucked up. I refuse to believe that Microsoft Windows in 2021 cannot display most unicode. I mean, we don't even seem to have quadrants, and yet I think I've seen quadrants generated by other programs.

Figure out what we're doing wrong, and rectify it.

dankamongmen commented 3 years ago

@j4james, how would you describe Unicode capabilities in Windows Terminal?

j4james commented 3 years ago

Here's a screenshot from the notcurses uniblock demo running under WSL. Most of the BMP blocks should be showing content like this, so if most UTF-8 is not working for you, then something is seriously wrong.

image

Once you get to the higher planes things become a little more messy, mostly in terms of the width calculations being wrong, so you'll have text "leaking" out of the area you were expecting it to fit and screwing up the rest of the page. I think this is because Windows stores Unicode in 16-bit "wide characters", and code points from the supplementary planes require surrogate pairs, so they don't fit in a single cell. Don't quote me on that, though. I just know that it's a known issue that they're still working on.

It's also worth mentioning that there are additional limitations in the old conhost console. It uses a GDI renderer that doesn't do font fallback so if your selected font doesn't support a particular code point, you're just going to get a � (or something of the sort). I think the GDI renderer also ignored anything outside the BMP, so none of the supplementary planes are likely to work. Both those issues have been worked on recently, but unless you're building from source, you won't have those fixes.

Then if you're testing on Mintty (or any other third-party terminal), there could be a number of other issues coming into play, so I'd suggest you stick to Windows Terminal to start with, until you've got something working reasonably.

dankamongmen commented 3 years ago

if you could run notcurses-info.exe for me, and let me know if it's mostly empty on your side, i'd appreciate it. that looks great, holy crap!

dankamongmen commented 3 years ago

i'm testing on both. right now windows terminal is getting close to usable. mintty i would not say so, due to input problems detailed in #2116. in neither do i get much in the way of good notcurses-info.exe output. here's a pretty ideal output (kitty):

2021-09-01-200830_1251x1417_scrot

j4james commented 3 years ago

OK, there are quite a few gaps on the notcurses-info page. So if that's what's worrying you, that's to be expected.

image

j4james commented 3 years ago

Just looking at a couple of the missing code points at random, they seem to be outside the BMP, but are meant to be narrow width, so this could be the surrogate pair issue I mentioned above. Otherwise it may just be that my system doesn't have an appropriate font for all of those code points, although then I would expect a bunch of � replacement characters instead. Whatever it is, I don't think it's your problem.

j4james commented 3 years ago

Now I'm beginning to have some doubts. I've just installed a font that it is supposed to have the segmented digit characters. That didn't seem to help with font fallback, but if I select it as my primary font, I can display those characters in the shell, like this:

image

When I run notcurse-info, though, I'm still seeing those code points missing. I wouldn't have expected it to work perfectly, because I know the width calculations are going to be wrong, but I am surprised it's not showing anything at all.

dankamongmen commented 3 years ago

oh hey you seem to be seeing a good bit more than i am.

2021-09-01-214801_2019x1308_scrot

dankamongmen commented 3 years ago

2021-09-01-215020_2019x1308_scrot

dankamongmen commented 3 years ago

you have uniblock running in a Windows Terminal? it exits with failure immediately for me =[

dankamongmen commented 3 years ago

ohhhhhhhhhhhh you have WSL there not Windows Terminal. that's a whole different thing, no? that's using UNIX interfaces.

j4james commented 3 years ago

Yeah. I haven't tried getting the native Windows build running. I was just trying to show you what Unicode you should be able to see in Windows Terminal.

Also I can redirect the notcurses-info output to a file, and then type that file from a cmd.exe shell, and as long as I've set the UTF8 codepage first, I still get the same result (more or less). So a native Windows shell should definitely be doing a better job than what you're seeing.

dankamongmen commented 3 years ago

i am setting the code page in Windows Terminal ~for sure~ i'm pretty sure

j4james commented 3 years ago

Actually, that's probably an easy way for you to test the Windows build. Just redirect to a file and compare the output to what you're seeing from a Linux build. I would think they would be more or less the same if things were working correctly. Then you don't have to worry about whether or not the terminal is rendering it correctly.

dankamongmen commented 3 years ago

i wouldn't be shocked if this is due to 16-bit wchar_t...but i primarily deal with utf-8. it's possible though.

dankamongmen commented 3 years ago

yeah, i'm becoming pretty convinced that this has to do with being beyond the BMP and thus outside the range of wchar_t without surrogate pairs.

dankamongmen commented 3 years ago

_setmode(_O_U8TEXT) might be relevant, unsure.

Set the console code page to cp65001 (UTF-8) doesn’t improve Unicode support, it is the opposite: non-ASCII are not rendered correctly and type non-ASCII characters (e.g. using the keyboard) doesn’t work correctly, especially using raster fonts.

hrmmm.

dankamongmen commented 3 years ago

Set the console code page to cp65001 (UTF-8) doesn’t improve Unicode support, it is the opposite: non-ASCII are not rendered correctly and type non-ASCII characters (e.g. using the keyboard) doesn’t work correctly, especially using raster fonts.

removing this definitely did not help

j4james commented 3 years ago

FYI, I figured out why there were so many gaps in the notcurse-info output from my WSL build. It seems that on my system those characters aren't support by wc_width, so it returns -1. I wrote a little test app like this:

wchar_t wc = 0x1fbf0;
int cols = wcwidth(wc);
printf("%d\n", cols);

And the output I get is -1.

I don't unix, so I don't know whether that means I need to upgrade the compiler or the libraries, or there's something else I'm doing wrong, but at least it suggests it's not your problem, and it likely isn't a terminal problem either.

dankamongmen commented 3 years ago

i notice that Braille works just fine for us in notcurses-demo k, but it doesn't show up in notcurses-info. very interesting.

dankamongmen commented 3 years ago

alright, coming back around to take a look at this. i think the thing to do is to write a small testing tool that allows us to explore these characters in windows. they're definitely usable, we're just doing something wrong. i'll look into this this weekend.

dankamongmen commented 2 years ago

once we get this resolved, we're going to pretty much be right on windows, so let's put some effort in here soon.

j4james commented 2 years ago

Before you get too stressed about this, note that the narrow width characters in the astral planes are expected not to work correctly in Windows Terminal. They're going to take up twice as many cells as expected, which completely screws up the notcurses-info output. This is what it looks like at the moment when running from WSL:

image

It actually looked better before you fixed the wcwidth problem and the characters were just dropped, but I'm definitely not suggesting you revert that. This is something that Windows Terminal needs to fix.

dankamongmen commented 2 years ago

It actually looked better before you fixed the wcwidth problem and the characters were just dropped, but I'm definitely not suggesting you revert that. This is something that Windows Terminal needs to fix.

do you know of any upstream bug i can track and/or comment on?

dankamongmen commented 2 years ago

ok, it's all clear now:

RAST 00000020 [ ] to 45/0 cols: 1 40ffffff40191970
RAST 000000e2 [▒] to 45/1 cols: 1 40ffffff40191b70
RAST 00000096 [▒] to 45/2 cols: 1 40ffffff40191e71
RAST 00000098 [▒] to 45/3 cols: 1 40ffffff40192071
RAST 000000e2 [▒] to 45/4 cols: 1 40ffffff40192372
RAST 00000096 [▒] to 45/5 cols: 1 40ffffff40192572
RAST 0000009d [▒] to 45/6 cols: 1 40ffffff40192872
RAST 000000e2 [▒] to 45/7 cols: 1 40ffffff40192a73
RAST 00000096 [▒] to 45/8 cols: 1 40ffffff40192c73
RAST 00000080 [▒] to 45/9 cols: 1 40ffffff40192f73
RAST 000000e2 [▒] to 45/10 cols: 1 40ffffff40193174
RAST 00000096 [▒] to 45/11 cols: 1 40ffffff40193474
RAST 00000096 [▒] to 45/12 cols: 1 40ffffff40193675
RAST 000000e2 [▒] to 45/13 cols: 1 40ffffff40193875
RAST 00000096 [▒] to 45/14 cols: 1 40ffffff40193b75
RAST 0000008c [▒] to 45/15 cols: 1 40ffffff40193d76
RAST 000000e2 [▒] to 45/16 cols: 1 40ffffff40194076
RAST 00000096 [▒] to 45/17 cols: 1 40ffffff40194276
RAST 0000009e [▒] to 45/18 cols: 1 40ffffff40194577

this is from the quadrants output in unicodedumper() from notcurses-info. look at e.g. 45/1--45/3. we're emitting 0xe2, 0x96, and 0x98 as three columns. that's the UTF8 for U+2598 QUADRANT UPPER LEFT, which is what we want to see. but we ought be seeing all three bytes in a single cell.

also, right above this, we have:

▘▝▀▖▌▞▛▗▚▐▜▄▙▟█⎧ 49

output directly to stderr. so yeah, it's all a matter of our UTF8 being broken up into cells. find that, and we've got this resolved.

dankamongmen commented 2 years ago

it looks like mbrtowc() is always returning 1?

dankamongmen commented 2 years ago

i think we have a ridiculously low MB_CUR_MAX when we compile...

dankamongmen commented 2 years ago

yep

dankamongmen commented 2 years ago

we're getting somewhere!

image

j4james commented 2 years ago

I can't comment on mbrtowc or anything that you may be doing right or wrong in notcurses. All I'm saying is that no matter how perfect your code is, the output is going to look broken in Windows Terminal (and the conhost console for that matter). Don't assume that broken output is your fault.

If you want an issue to track, the root of the problem is probably https://github.com/microsoft/terminal/issues/8000 - essentially the text buffer implementation needs to be rewritten. But if you want to comment on this specific manifestation, something like https://github.com/microsoft/terminal/issues/11694 might be more appropriate.

dankamongmen commented 2 years ago

aye, but we've just made massive progress! we now have quadrants!

dankamongmen commented 2 years ago

so it's not that mbrtowc() always returns 1, it's that you have to set the locale up properly for Windows. in UNIX land, we usually want a setlocale(LC_ALL, "") to pull from LANG. not so much on windows. furthermore, just setting the encoding to UTF8 doesn't get us all the way there; we appear to require a setlocale(LC_ALL, ".UTF8"). but at that point, we start getting real results from even lowly old mbrtowc(). yay! tally ho!

dankamongmen commented 2 years ago

highlighting covers the expected area, perhaps for the first time!

image

dankamongmen commented 2 years ago

looking good in actual Microsoft Terminal as opposed to MinTTY, too!

image

dankamongmen commented 2 years ago

the [luigi] demo now looks PERFECT, with none of the weird bugs we were seeing before, yay ya yay

image

dankamongmen commented 2 years ago

[intro] now looks PERFECT

image

dankamongmen commented 2 years ago

https://www.youtube.com/watch?v=lO6mNbJGWHI oh yeaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaah

dankamongmen commented 2 years ago

ahhh this is a glorious day indeed, w00t w00t w00t, all good things come to he who hacks

dankamongmen commented 2 years ago

as hoped, this has also fixed various demos which were failing, including [uniblock] and [normal]

dankamongmen commented 2 years ago

we actually have multiple demos working one-after-another now, with a working braille FPS plot, tremendous improvement.

image

dankamongmen commented 2 years ago

alright, whatever problems still exist, we no longer have the vast majority of our supra-BMP output on windows/msys fubar. we'll create focused issues for remaining problems, but this showstopper is resolved.

DHowett commented 1 year ago

microsoft/terminal#14640 and microsoft/terminal#13626 probably put a significant dent in this issue; they were just released as part of v1.17.1023.

The infrastructure they lay will be available in newer[1] versions of ConPTY and therefore other terminal emulators on Windows at some future point.

[1] we have plans about how we can update ConPTY outside of the Windows update cadence :)

dankamongmen commented 1 year ago

awesome! are you a fellow Friend of Redmond? feel free to hit me up at niblack on teams =]