golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.14k stars 17.46k forks source link

x/text: iterator does not handle zero width joiner #60522

Open l0n3star opened 1 year ago

l0n3star commented 1 year ago

What version of Go are you using (go version)?

$ go version 1.20

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

I'm using the go playground

What did you do?

package main

import (
    "fmt"

    "golang.org/x/text/unicode/norm"
)

func main() {
    var ia norm.Iter
    s := "πŸ‘©πŸΎβ€πŸ¦°πŸ‘±πŸΎπŸ§‘πŸΎβ€βš–οΈ"

    ia.InitString(norm.NFD, s)
    rev := ""

    for !ia.Done() {
        n := ia.Next()
        rev = string(n) + rev
    }

    fmt.Println(rev)
}

Playground: https://go.dev/play/p/Jq9CnUd2zA8

What did you expect to see?

It should print πŸ§‘πŸΎβ€βš–οΈπŸ‘±πŸΎπŸ‘©πŸΎβ€πŸ¦°

I think it has to do with the library not handling zero width joiners. If I use this string: "πŸ˜€πŸ˜‚" it works as expected.

What did you see instead?

οΈβš–β€πŸΎπŸ§‘πŸΎπŸ‘±πŸ¦°β€πŸΎπŸ‘©

mknyszek commented 1 year ago

CC @mpvl via https://dev.golang.org/owners

elliotwutingfeng commented 1 year ago

The underlying issue is that x/text doesn't handle multi-rune unicode grapheme clusters.

In the meantime, you could use this third-party module uniseg to iterate through grapheme clusters.