arp242 / uni

Query the Unicode database from the commandline, with good support for emojis
MIT License
802 stars 19 forks source link

MELTING FACE emoji is not visible in `uni i` #32

Closed esdnm closed 2 years ago

esdnm commented 2 years ago

Screenshot from 2022-04-30 13-01-23

esdnm commented 2 years ago
[ark@kiln ~]$  uni i 🫠
     cpoint  dec    utf8        html       name (cat)
'�' U+1FAE0 129760 f0 9f ab a0 🫠  MELTING FACE (Other_Symbol)
arp242 commented 2 years ago

The problem is that Go's unicode.IsPrint() function returns false for this codepoint, rather than true; uni avoids printing control characters and the like so it gets replaced by U+FFFD (the "replacement character" you're seeing).

It should work if you use -raw or -r, which disables this behaviour and outputs control characters as-is. uni emoji works fine as well, as it doesn't contain this logic.

The reason this is happening is because these codepoints were added in Unicode 14 (from last September), and Go seems to be using Unicode 13 (the 1.16 release notes mention it got updated to 13, but nothing in the 1.17 and 1.18 release notes).

It can/should be fixed by not using the Go stdlib unicode package (it's only used for this function). We have our own Unicode database already, so we should use that. It doesn't expose a good API for this at the moment though.