coding rules

1st Byte	2nd Byte	3rd Byte	4th Byte	Number of Free Bits	Maximum Expressible Unicode Value
0xxxxxxx				7	007F hex (127)
110xxxxx	10xxxxxx			(5+6)=11	07FF hex (2047)
1110xxxx	10xxxxxx	10xxxxxx		(4+6+6)=16	FFFF hex (65535)
11110xxx	10xxxxxx	10xxxxxx	10xxxxxx	(3+6+6+6)=21	10FFFF hex (1,114,111)

UTF-8 Encoding

Bear plus snowflake equals polar bear

https://andysalerno.com/posts/weird-emojis/#

👩🏾 + ❤ + 💋 + 👩🏻 =

🐻 (bear; U+1F43B) + ❄ (snowflake; U+2744) \= ️️(polar bear; U+1F43B U+200D U+2744 U+FE0F)

So, as we have learned, a Unicode character can be made of multiple bytes, but it can also be made of multiple other Unicode characters. And they can be quite large – 35 bytes, in the earlier example.

package main

import (
    "fmt"
    "reflect"
)

func main() {
    fmt.Println("🙂 is this many runes:", fmt.Sprintf("%08b", '🙂'), "printed as strings:", runesAsStrings([]rune("🙂")))
    fmt.Println("👩🏾‍❤️‍💋‍👩🏻 is this many runes:", []rune("👩🏾‍❤️‍💋‍👩🏻"), "printed as strings:", runesAsStrings([]rune("👩🏾‍❤️‍💋‍👩🏻")))
    fmt.Println("👩🏿 is this many runes:", []rune("👩🏿"), "printed as strings:", runesAsStrings([]rune("👩🏿")))
    fmt.Println("👩‍🚀️ is this many runes:", []rune("👩‍🚀️"), "printed as strings:", runesAsStrings([]rune("👩‍🚀️")))
    fmt.Println("👩🏾‍❤️‍💋‍👩🏻 is this many runes:", []rune("👩🏾‍❤️‍💋‍👩🏻"), "printed as strings:", runesAsStrings([]rune("👩🏾‍❤️‍💋‍👩🏻")))
    // Creating a rune
    rune1 := 'B'
    rune2 := 'g'
    rune3 := '\a'

    // Displaying rune and its type
    fmt.Printf("Rune 1: %c; %08b Unicode: %U; Type: %s\n", rune1, rune1, rune1, reflect.TypeOf(rune1))
    fmt.Printf("Rune 2: %c; %08b Unicode: %U; Type: %s\n", rune2, rune2, rune2, reflect.TypeOf(rune2))
    fmt.Printf("Rune 3: %c; %08b Unicode: %U; Type: %s\n", rune3, rune3, rune3, reflect.TypeOf(rune3))
}

func runesAsStrings(runes []rune) (s string) {
    for _, r := range runes {
        s += string(r)
    }
    return
}

That's why it's called a rune (a code point), and not a grapheme cluster ;)

这就是为什么它被称为符文(一个代码点) ，而不是字素集群;)

https://www.reddit.com/r/golang/comments/o1o5hr/fyi_a_single_go_rune_is_not_the_same_as_a_single

String length is not always rune length 字符串长度并不总是符文长度
rune count is not always rune width (monospace font) 符文计数并不总是符文宽度(单空间字体)
Unicode is hard Unicode 很难

bingoohuang / blog

UTF-8 #203

coding rules

Bear plus snowflake equals polar bear

That's why it's called a rune (a code point), and not a grapheme cluster ;)