arp242 / uni

Query the Unicode database from the commandline, with good support for emojis
MIT License
802 stars 19 forks source link

FR: use CLDR Character Annotation “Keywords” too when searching characters/emoji #11

Closed blueset closed 3 years ago

blueset commented 4 years ago

Unicode CLDR Character Annotation has provided a list of keywords for some characters (especially emoji) that is to enhance the search experience of them.

The remaining phrases are keywords (labels), separated by “|”. The keywords plus the words in the short name are typically used for search and predictive typing.
— CLDR Character Annotations description

I would like to suggest to include these keywords too when searching for both Unicode characters and emojis.

A List of these annotations can be found here: https://www.unicode.org/cldr/charts/36/annotations/romance.html

Computer-friendly character annotation data in XML for each language can be found here: https://github.com/unicode-org/cldr/tree/master/common/annotations

arp242 commented 4 years ago

Yeah it's in the TODO: https://github.com/arp242/uni/blob/master/TODO#L5

It's not so easy as "just include CLDR data", since a lot of it is kinda junky IMHO. Many of the basic smileys contain keywords such as "mouth", "eye", etc. so the list needs some filtering. Maybe there's a better list of keywords somewhere; I don't know what GitHub uses for their :emoji-style emojis (pretty sure I saw a list for that somewhere at some point).

I probably won't work on this any time soon, but will happily review and merge PRs if anyone contributes.

arp242 commented 3 years ago

It now includes the CLDR data in the default output, which duplicate words omitted (i.e. no point in adding "face" if the emoji's name is already "grinning face", but you can add %(cldr_full) if you want it anyway. This is also searched by default now with uni e smile; use uni e name:smile to search in the name specifically.