emilk / egui

egui: an easy-to-use immediate mode GUI in Rust that runs on both web and native
https://www.egui.rs/
Apache License 2.0
21.34k stars 1.54k forks source link

Incorrect rendering of more complex Unicode text #2517

Open Erutuon opened 1 year ago

Erutuon commented 1 year ago

Describe the bug Left-to-right code points, cursive code points, and some grapheme clusters are rendered incorrectly in the Windows native platform.

I made a little Rust project in Windows that registers a hotkey to pop up a window to let me enter and identify Unicode characters. I work on Wiktionary and egui seemed like the easiest GUI library to use for this project because I don't have to figure out keyboard event handlers and it has good defaults for the UI elements. egui handles Unicode data just fine, but has problems with rendering that involves multiple code points rendered with a single glyph and context-dependent glyphs. This isn't a dealbreaker for me because I'm just identifying characters, but it makes it unusable for people writing GUIs in certain languages, like Arabic or Hindi.

The screenshot at the bottom shows a stack of two combining diacritics rendered in the Gentium Plus font and three Arabic letters in Scheherazade and the word "Hangeul" in Noto Sans CJK KR and a random Hindi word from Wiktionary in Siddhanta: ế ابج 한글 अत्यधिक. The combining diacritics render on top of each other, the Arabic letters render left-to-right as disconnected letters,the Hangul letters render separately,the consonant cluster त्य is rendered as three glyphs (a letter, a diacritic under the letter, and another letter, looking like त्‌य), and in the consonant vowel combination धि it puts the vowel ि after the consonant (like ध ि but without the space).

In my Firefox, which renders these correctly, the second combining diacritic will be above or to the right of the first, the Arabic letters are rendered cursive and right-to-left, the Hangul letters are rendered as their syllable block versions 한글, त्य is a single glyph, and धि has the vowel positioned before the consonant.

It looks like it basically renders the code points separately and then overlays them if they are in a grapheme cluster, whereas I guess proper rendering of the diacritics requires breaking the text into graphemes and then rendering each grapheme with any left or right joining behavior taken into account based on the neighboring characters.

This at least affects the Windows native renderer, and maybe others depending on how much of the text rendering is shared among the different platforms. Replicating is probably as simple as pasting the text into a UI element, but I can put my code up on GitHub if you want.

Desktop (please complete the following information):

image

parasyte commented 1 year ago

Probably related, FWIW: #56