boinkor-net / chars

cha(rs) is a commandline tool to display information about unicode characters
https://github.com/boinkor-net/chars
MIT License
183 stars 13 forks source link

Allow effective searching for flags and other zwj-joined symbols #36

Open antifuchs opened 4 years ago

antifuchs commented 4 years ago

Turns out we can't find, e.g., the transgender flag (new in unicode 13!) - its codepoints are

U+1F3F3
U+FE0F
U+200D
U+26A7
U+FE0F

...meaning we can only find the constituent codepoints, but not the whole. That's a problem for all kinds of flags, family configurations and other glyphs composed of multiple codepoints.

The sequences have names, so we ought to be able to retrieve them.

antifuchs commented 4 years ago

There's a list of unicode emojis and emoji zwj sequences here:

https://www.unicode.org/Public/emoji/13.0/emoji-sequences.txt and https://www.unicode.org/Public/emoji/13.0/emoji-zwj-sequences.txt - I imagine we have to integrate this in chars_data as a separate data set.

I think the internal character representation will have to grow into an enum (or get another variant with the sequence representation) - with adjusted display functions to go with it.