Closed kbd closed 3 years ago
This is pretty much what uni identify
is:
[~]% uni identify asd
cpoint dec utf8 html name (cat)
'a' U+0061 97 61 a LATIN SMALL LETTE… (Lowercase_Lett…)
's' U+0073 115 73 s LATIN SMALL LETTE… (Lowercase_Lett…)
'd' U+0064 100 64 d LATIN SMALL LETTE… (Lowercase_Lett…)
Essentially it's a "UTF-8 hexdump".
I'm talking about something like:
$ uni identify --utf8 "e2 80 8b"
cpoint dec utf8 html name (cat)
'�' U+200B 8203 e2 80 8b ​ ZERO WIDTH SPACE (Format)
You could just copy the code from Slack to uni
, right? That's how I use it anyway.
I suppose some syntax could be added to print
; I'm not sure if it's a common use case, and I'm not likely to work on it any time soon, but I'll happily review and merge patches, or I'll probably take it up eventually.
Personally I'd just pipe it to grep
(uni p all | grep 'e2 80 8b'
) in the rare case I'd want it, which is a wee bit slow, but works well enough.
$ uni p all | rg 'e2 80 8b'
'�' U+200B 8203 e2 80 8b ​ ZERO WIDTH SPACE (Format)
oh, that'll work most of the time, thanks.
You can now use uni p 'utf8:e2 80 8b'
, and a few variants thereof:
$ uni p 'utf8:e2 80 8b' 'utf8:e2808b' 'utf8:0xe2 0x80 0x8b' 'utf8:e2-80-8b'
cpoint dec utf8 html name (cat)
'�' U+200B 8203 e2 80 8b ​ ZERO WIDTH SPACE (Format)
'�' U+200B 8203 e2 80 8b ​ ZERO WIDTH SPACE (Format)
'�' U+200B 8203 e2 80 8b ​ ZERO WIDTH SPACE (Format)
'�' U+200B 8203 e2 80 8b ​ ZERO WIDTH SPACE (Format)
I think that should cover all the common syntaxes; the utf8:
prefix is needed to disambiguate with codepoints, since uni p 0x200B
or just uni p 200B
without a leading U+
will print the codepoint already.
Recently had a problem with some code I copied from a coworker from Slack. For some reason, lines showed up as having been changed in git even though I couldn't see what was different. Put it through a hex editor and saw
e2 80 8b
. Went to my usual tool for this type of thing, FileFormat.info, typed that in, and it came up with the right answer, that there were zero-width spaces inserted.I'd like to be able to use
uni
to search by utf8 text like that.