Unicode hourglass is rendered with wrong char width

elves / elvish

Powerful scripting language & versatile interactive shell

https://elv.sh/

BSD 2-Clause "Simplified" License

5.53k stars 297 forks source link

Unicode hourglass is rendered with wrong char width #1705

Open notramo opened 1 year ago

notramo commented 1 year ago

U+231B (⌛) is rendered as 1 chars wide, but it's actually 2 chars wide. When used in the right prompt, this causes clipping the last char of the prompt. (I haven't tested with left prompt.)

xiaq commented 1 year ago

Hmm this line isn't nearly sufficient to cover all the emojis

https://github.com/elves/elvish/blob/aa24cd2851f33fb36907c0f12959b01e0b743ab2/pkg/wcwidth/wcwidth.go#L98

Looking at https://en.wikipedia.org/wiki/Emoji#In_Unicode, there are a lot of codepoints before U+1F300 that should be considered emojis now; those should all be added.

Before this is fixed you can add -override-wcwidth ⌛ 2 to your rc.elv as a workaround.

xiaq commented 1 year ago

Hmm apparently it's not as simple as adding a range of emoji characters. Some characters are only emoji when followed by U+FE0F. ® (U+00AE) is not an emoji but ®️ (U+00AE U+FE0F) is an emoji. Time to read up...

krader1961 commented 11 months ago

The change I have pending (soon to be a pull-request) to have pprint vertically align the values of maps adds a new dependency on github.com/mattn/go-runewidth. I'll try to find a minute to see if it correctly handles this case. I think we definitely want to use a good third-party package (assuming nothing suitable exists in the Go stdlib or golang.org/x/...) for answering this type of question rather than reinventing the wheel.

krader1961 commented 11 months ago

So I am not sure how to interpret the result of running the following program. On my platform, iTerm on macOS, I see the following output:

⌛ 2
® 1
® 1
®️ 1
®️ 1

The last two lines actually show a large registered trademark symbol in my terminal (unlike what I see when viewing this web page) while the previous two show a small symbol. But both have a width of one according to the github.com/mattn/go-runewidth package. Whereas the U+231B character has the expected width of two.

package main

import (
        "fmt"

        gr "github.com/mattn/go-runewidth"
)

func main() {
        var s = "⌛"
        fmt.Println(s, gr.StringWidth(s))
        s = "\u00AE"
        fmt.Println(s, gr.StringWidth(s))
        s = "®"
        fmt.Println(s, gr.StringWidth(s))
        s = "\u00AE\uFE0F"
        fmt.Println(s, gr.StringWidth(s))
        s = "®️"
        fmt.Println(s, gr.StringWidth(s))
}

krader1961 commented 5 months ago

FWIW, I wrote a small program to compare the display width reported by the Elvish wcwidth package and the github.com/mattn/go-runewidth. The latter returns two for codepoint U+231B while the former returns one. While I have found a couple of codepoints where go-runewidth seems to return the wrong display width it is correct far more often than wcwidth. As I pointed out in my pull-request I don't think this is a wheel we should be reinventing. We should be using go-runewidth, or an actively maintained package similar to it, rather than rolling our own, incomplete and flawed, solution to the problem.