Closed esdnm closed 2 years ago
[ark@kiln ~]$ uni i ðŸ«
cpoint dec utf8 html name (cat)
'�' U+1FAE0 129760 f0 9f ab a0 🫠 MELTING FACE (Other_Symbol)
The problem is that Go's unicode.IsPrint()
function returns false
for this codepoint, rather than true
; uni avoids printing control characters and the like so it gets replaced by U+FFFD (the "replacement character" you're seeing).
It should work if you use -raw
or -r
, which disables this behaviour and outputs control characters as-is. uni emoji
works fine as well, as it doesn't contain this logic.
The reason this is happening is because these codepoints were added in Unicode 14 (from last September), and Go seems to be using Unicode 13 (the 1.16 release notes mention it got updated to 13, but nothing in the 1.17 and 1.18 release notes).
It can/should be fixed by not using the Go stdlib unicode
package (it's only used for this function). We have our own Unicode database already, so we should use that. It doesn't expose a good API for this at the moment though.