bczsalba / pytermgui

Python TUI framework with mouse support, modular widget system, customizable and rapid terminal markup language and more!
https://ptg.bczsalba.com
MIT License
2.16k stars 53 forks source link

[BUG] Long unicode (emojis) get wrong length calculated #126

Closed manuelF closed 4 months ago

manuelF commented 1 year ago

Describe the bug Using emojis (on terminals that support it) misaligns the windows due to wrong length computed (byte length vs rune length)

To Reproduce

Taking emojis from the https://en.wikipedia.org/wiki/X_mark:

import pytermgui as ptg

with ptg.WindowManager() as manager:
    manager.add(ptg.Window(
        ptg.Label("NormalLabel"),
        ptg.Label("1x Emoji Label: ❌"),
        ptg.Label("2x Emoji Label: ❌❌"),
        ptg.Label("3x Emoji Label: ❌❌❌"),
        ptg.Label("1x Normal Label: X"),
        ptg.Label("2x Normal Label: X X"),
        ptg.Label("3x Normal Label: X X X"),
    ))

Expected behavior A normal outer box.

**Seen behaviour*** Boxes with emojis on the line are offset, due to printing chars differently.

╔══════════════════════════════════════╗
║              NormalLabel             ║
║           1x Emoji Label: ❌          ║
║          2x Emoji Label: ❌❌          ║
║          3x Emoji Label: ❌❌❌         ║
║          1x Normal Label: X          ║
║         2x Normal Label: X X         ║
║        3x Normal Label: X X X        ║

System information

$ ptg --version

PyTermGUI version 7.4.0

System details:
    Python version: 3.8.10
    $TERM:          xterm-256color
    $COLORTERM:     None
    Color support:  ColorSystem.EIGHT_BIT
    OS Platform:    Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.29$ ptg --version

Possible cause Possible incorrect way of computing real_length with wide-characters.

https://github.com/bczsalba/pytermgui/blob/56b2cc1dc74ada438088719d2ebe95e21d509ad6/pytermgui/widgets/base.py#L781

Possible solution See alternatives like: https://stackoverflow.com/a/30775818

Maybe?

Thanks!

manuelF commented 1 year ago

Note that the fix provided in PR#118 (after being modified to include both sets of chars) displays correctly. https://github.com/bczsalba/pytermgui/pull/118

RE_CHINESE = re.compile(r"[\u4e00-\u9fff]")
RE_EMOJI = re.compile(r"[\u2000-\u2fff]")

[...]

@lru_cache(maxsize=None)
def real_length(text: str) -> int:
    if bool(RE_CHINESE.search(text)) or bool(RE_EMOJI.search(text)):
        return sum(wcswidth(c) for c in strip_ansi(text))    
    return len(strip_ansi(text))