contour-terminal / terminal-unicode-core

Unicode Core specification for Terminal (grapheme clusters, character widths, ...)
30 stars 1 forks source link

Please consider adding the generated markdown directly in this repo #2

Open wez opened 11 months ago

wez commented 11 months ago

I don't read tex natively and it's super inconvenient to download and read the markdown outside of just allowing github to render it here. The PDF format doesn't respect my dark mode preference either!

I'm going to cheekily paste in the current version of the markdown here so that I can read the spec in the meantime!


author:

History and current state

Historically, only 7-bit characters with C0 control codes were supported by terminals and different languages by selecting their respective code pages.

Later on this was extended to 8-bit ASCII and along with C1 control codes.

With the introduction of Unicode there were no need to have codepages anymore, but the Unicode spec was not explicitly designed to also cover terminals, except that C0 and C1 codepoints were preserved.

With Unicode UTF-8 it was possible to at least pass Unicode characters to the terminal, but rendering of a few characters as well as their respective cursor placement is not defined in the Unicode standard.

Also, Unicode introduced codepoint sequences that are mapping to a single user perceived character - so called grapheme clusters. The terminal has never attempted any formalization on how to deal with grapheme clusters, variation selectors, their east asian width, nor emoji and emoji presentation handling.

This spec tries to address some of the problems terminals are suffering with Unicode today.

Backwards Compatibility

basic points are: Everything is disabled by default, so legacy apps don't break more than they used to break already.

Backwards compatibility is retained by leaving everything as undefined as it is without this specification.

The application can test for the availability of this feature and has to explicitly enable it in order to get the set of properties as defined in this document guaranteed.

Future Compatibility and Stability

Unicode itself had a major breakage at version between version 8 and 9 with regards to some codepoints having their east asian width changed.

While this may happen any time again, we do not expect that to happen that soon nor that frequent to address future incompatibilities as of this spec and leave this for a later point.

Feature and Mode State Detection

[CSI ? 2027 $ p]{style="background-color: light-gray"}([ref:DECRQM]{reference-type="ref" reference="ref:DECRQM"}) can be used for testing the availability of this feature as well as the current mode the terminal is in with regards to this specification, the [CSI ? 2027 $ p]{style="background-color: light-gray"}reply will indigate each state acurately enough not not need any new VT sequence introduced.

Mode Switching

Semantics

The following set of semantics MUST be adhered to if this VT mode [2027]{style="background-color: light-gray"} is enabled. If the VT mode [2027]{style="background-color: light-gray"} is not set, then the behavior is as undefined as if this specification was not implemented at all in order to retain behavior of current terminals and their legacy applications.

Grapheme Cluster

{#section .unnumbered}

With this mode enabled, the terminal MUST support grapheme clusters in conformance to algorithm as described in UTS 29 [ref:UTS-29]{reference-type="ref" reference="ref:UTS-29"}.

{#section-1 .unnumbered}

This implies that every consecutively written character on the terminal stream that is non-breakable as per UTS 29 [ref:UTS-29]{reference-type="ref" reference="ref:UTS-29"} will always end up in the same terminal's grid cell.

{#section-2 .unnumbered}

Therefore, extending a grapheme cluster with consecutively added codepoints will not move the cursor except for variation selector 16 (VS16) that may have caused the width of the grapheme cluster to change to wide (2 grid cells).

{#section-3 .unnumbered}

When the cursor moves to a grid cell that contains a complete or incomplete grapheme cluster, this grid cell's contents will be erased and overwritten rather then textually concatinated.

{#section-4 .unnumbered}

Therefore cursor movement semantics of the terminal remain unchanged.

Emoji

{#section-5 .unnumbered}

Emoji symbols are always rendered in square aspect ratio (as proposed by UTS 51 [ref:UTS-51]{reference-type="ref" reference="ref:UTS-51"}), implying a East Asian Width of Wide, 2 grid cells.

{#section-6 .unnumbered}

ZWJ emoji are required to be displayed as a single image with a width of 2 grid cells.

{#section-7 .unnumbered}

The alternate display of ZWJ emoji in a decomposed sequence of sub-images must not be used as a fallback as it will break cursor movemeent guarantees.

{#section-8 .unnumbered}

If a ZWJ emoji cannot be rendered the display behavior is undefined - for example, a unicode replacement character [U+FFFD]{style="background-color: light-gray"} could be displayed instead.

{#section-9 .unnumbered}

In emoji emoji presentation, the cursor will always move by 2 grid cells.

{#section-10 .unnumbered}

SGR attributes applied to a grid cell containing an emoji symbol are not strictly defined and it is left to the terminal emulator to have sensible meaningful semantics with regards to emoji symbols.

Variation Selector 16

VS16 promotes the grapheme cluster to emoji emoji presentation, implying that this will force the grapheme cluster's width to be 2, which may possibly cause reflowing of that symbol to the next line if on right margin with AutoWrap mode is set.

Variation Selector 15

{#section-11 .unnumbered}

VS15 forces the grapheme cluster to emoji text presentation. This will NOT change the underlying width but only change the display to prefer textual non-colored presentation.

{#section-12 .unnumbered}

This matches the behavior of todays web browsers and should thus feel most intuitive to users.

{#section-13 .unnumbered}

The cursor will move by columns if the symbol has the default presentation of emoji.

Margins and AutoWrap with Emoji

Emoji written at the right margin with AutoWrap mode disabled may or may not be rendered in half or not be displayed at all. This behavior is undefined to ease implementation and adoption of this specification.

References

christianparpart commented 11 months ago

Hey @wez,

all good. I now just don't know what to do with this ticket. For certain tasks I think LaTeX is just better suited, especially since it can render to Markdown as well.

I am always open to improve the publisher format, and therfore I am absolutely open to suggestions.

But apart from that, also feedback to the spec is very welcome. There are (in the end) not yet many terminals that do in fact try to move forward on the grapheme cluster end, and I think TUI/CLI apps need this kind of discoverability to gain trust and also start relying on the modern way of laying out complex graphemes in the terminal.

I think we can have the markdown uploaded to a github.io page upon push/merge to master branch, such that it is easy to read from there as well (should suit you for sure)

wez commented 11 months ago

Hi @christianparpart!

re: the spec, I think it sounds fine. FWIW, wezterm reports permanently-enabled for this setting and doesn't allow disabling it.

https://github.com/wez/wezterm/issues/4223 was a request to offer application level control, but as part of looking into it, I decided that it was a lot of effort to undo what I was already doing :-p

re: the markdown and this issue, I don't have a strong preference on the implementation details, but I think the goal should be to make it as quick and easy as possible for someone to view it, without having extra steps to download or open a helper application. Personally, I would probably just check it in directly, but deploying it to GH pages is also OK.