contour-terminal / contour

Modern C++ Terminal Emulator
http://contour-terminal.org/
Apache License 2.0
2.3k stars 100 forks source link

Support for Indic scripts #1533

Closed shreevatsa closed 1 week ago

shreevatsa commented 1 week ago

Abstract

Support for non-Latin scripts, such as Indic scripts like Devanagari and Kannada, seems to be missing in Contour, at least on macOS. I just tried a bunch of terminals on macOS, and Contour is the worst of them (zero font support), which is surprising given the mention of Unicode Grapheme cluster support etc. I'm not sure whether I've done anything incorrectly (whether this is a bug report or feature request). This is what it looks like:

image

where sample.txt is:

ತನ್ನ ಒಂದು ಸತ್ಯಸಂಕಲ್ಪದಂತೆ ಸೃಷ್ಟಿಯಲ್ಲಿ ವ್ಯವಸ್ಥೆಯಿಲ್ಲದೆ ತನ್ನ

Motivation

Output from programs like diff, or for that matter ls, should work even when file contents or filenames contain Indic-script characters.

Specification

There is no specification; I believe no terminal renders these scripts correctly (though mlterm comes closest). Still, I'd hope that Contour could be at least as good as other terminals.

Yaraslaut commented 1 week ago

hi @shreevatsa Here is what i see for your text on my system image

you need to setup fonts accordingly in contour config file. And contour debug font.textshaping might give you some additional info

christianparpart commented 1 week ago

I think you might have strict_spacing set to true. Try setting it to false :)

shreevatsa commented 1 week ago

Thank you @Yaraslaut and @christianparpart. It is heartening to know that some amount of support exists in principle.

For what it's worth, I was not able to get it to work on macOS:

➜  ~ contour debug font.textshaping
Warning: Could not find the Qt platform plugin "cocoa" in "" ((null):0, (null))
Fatal: This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
 ((null):0, (null))
[1]    73822 abort      contour debug font.textshaping

and

➜  ~ contour font-locator
[error] The configured text shaping engine CoreText does not yet support font feature settings. Ignoring.
Matching fonts using  : CoreText
Font description      : (family=monospace weight=Regular slant=Roman spacing=Proportional, strict_spacing=no)
Number of fonts found : 1
  path /System/Library/Fonts/Menlo.ttc
christianparpart commented 1 week ago

@shreevatsa I just checked. I'm having the same output as you on MacOS, but it works flawlessly on non-MacOS (same software version).

So the only difference is how we discover fonts, on MacOS. This is something I can investigate tonight (I'm out with family during the day). I keep you posted.

I'd like to note one very important thing though (having quickly scanned through your blog article), which is: Unicode is not specified for terminals. Not at all. This is all undefined implementation dependant behaviour. We (as in some terminal developers) try indeed to get to the current century, when it comes to Unicode. Every TE has its own priorities there. For us for example, we focus on complex grapheme cluster support, especially related to emoji, but also on ligature support. Which is both very well supported in Contour. Languages like Hebrew, RTL, etc.

I don't like MacOS falling behind in font fallback (this is the issue here). I'll look into it later.

christianparpart commented 1 week ago

@shreevatsa the given PR above actually implements proper font fallback on MacOS. Thanks for reporting this. :)

[ ... ] and Contour is the worst of them (zero font support), [ ... ]

I wanted to clarify here something on the wording.

"zero font support" is impossible. Some font is always displayed, and fonts can change, bold, italic, bold/italic, this all works (also on latest stable release for macOS). What you maybe meant is font fallback support, which is, what I addressed in #1536 (for macOS), because apparently, since we switched away from fontconfig-use on macOS to the native CoreText API, we did not implement font fallback, but only basic font matching support. #1536 requires macOS 13.1 or higher, however.

which is surprising given the mention of Unicode Grapheme cluster support etc

grapheme segmentation is something entirely different. This is part of UAX #29 and is implemented in libunicode. In grapheme cluster segmentation, one determines how many (UTF-32) Unicode codepoints form a single user perceived character. This can range from 1 to many (e.g. 7) with zero width joiners or even variation selectors included to alter the display. This is something most terminals don't get right. You can try a little test script which I once wrote to just check our own terminal (not sure why I created a separated repo for that, I was probably a little bit too over-motivated :D).

For reference, i've put a small screenshot of the script's output here (this test script solely focuses on Unicode grapheme segmentation, shown by printing various emoji characters):

image
shreevatsa commented 1 week ago

Thank you so much!

To clarify:

Back to this issue: building from source after #1536 (following the steps from this comment: https://github.com/contour-terminal/contour/issues/1510#issuecomment-2183374781), I can confirm that after the recent PR, there is some positive change as font fallback seems to be working:

image

(The rendering isn't great, with characters overlapping etc, but most other terminals have similar issues, and I understand that implementing something better, when there isn't even any specification yet, may not be within scope. In the meantime on a personal note, I was able to get my work done using eshell, which is not a terminal emulator (thankfully), and doesn't try to force text to a grid.)

shreevatsa commented 1 week ago

Just for completeness, some concrete numbers for an example (from another repo https://github.com/wez/wezterm/issues/1333#issuecomment-1006328144): in the example there, the text "বাংলা ভাষা" has, at a font size where the space character (and thus one "cell") is 8 pixels wide:

So ideally this would be 65 pixels = 8.125 cells wide, but if that's not possible, what I as a reader would prefer would be for cell-alignment to happen at word boundaries (so বাংলা = 17 + 14 pixels = 3.875 cells would be rounded up to 4 cells, then a space, then ভাষা = 26 pixels = 3.25 cells would be either squeezed to 3 or rounded up to 4 cells).

I think Contour tries to render the whole thing 5 or 6 cells wide, as there are 5 graphemes and 6 glyphs (copy-pasted input was echo বাংলা ভাষা | wc — note there's a space before the |):

image

while Terminal and iTerm2 use 10 cells, rounding up each grapheme or maybe even each glyph (বাং 2.125 -> 3 cells, লা 1.75 -> 2 cells, ` 1 cell,ভা1.75 -> 2 cells,ষা` 1.5 -> 2 cells):

image image

(renders better or more readable, but cursor movement goes haywire).

I understand there is no specification here and it's a research problem how best to render these.


Edit: I understand that the equation of "one grapheme cluster = one terminal cell" can make sense for cursor movement (I wonder what's happening with wide emoji or wide East Asian characters?), but if that needs to be retained, I think one simple hack (that would make the text both readable and usable, at cost of some ugliness) would be to scale glyphs so that they don't exceed one cell's width. For the grapheme clusters in the example above:

etc. Kitty and wezterm seem to be attempting something like this, but half-heartedly (only for some glyphs).