Update to Unicode 13.0.0 standard

FelipeFTN / Emoji-Copy

😄 Emoji copy is a versatile extension designed to simplify emoji selection and clipboard management.

https://extensions.gnome.org/extension/6242/emoji-copy/

GNU General Public License v3.0

106 stars 12 forks source link

Update to Unicode 13.0.0 standard #21

Closed NatVIII closed 6 months ago

NatVIII commented 10 months ago

Current State Currently emoji-copy's database is based off of Unicode 12.1.0 thanks to @helena-dev (🏳️‍⚧️ solidarity)

Desired Goal By bringing emoji-copy into line with 13.0.0 standards we'll be able to then update it to 14.0.0 standards, and then bring it in line with 15.0.0 standards. My ultimate wish with this is setting achievable goals to work towards modernizing the available emojis.

pavinjosdev commented 7 months ago

I was wondering where these guys were (copied from emojipedia): 🥹 🥲

What needs to be done to update to Unicode 15.1? Does a PR updating the following files with the same format but new emojis do?

emoji-copy@felipeftn/data/emojisCharacters.js
emoji-copy@felipeftn/data/emojisKeywords.js

I'm thinking maybe I could write a python parser taking this and spitting out the arrays needed for Emoji-Copy.

pavinjosdev commented 7 months ago

@NatVIII @FelipeFTN Could you have a look at my proposed solution?

FelipeFTN commented 7 months ago

Heyy @pavinjosdev!! I'm sorry for the late response :sweat_smile: This is actually an amazing idea! Should work perfectly! :tada: I couldn't think of a better solution!

Building a Script that parses the latest Unicode emojis to our extensions to read should work nicely! I'm very excited to see this working, @pavinjosdev :eyes: Go on, feel free to open a Pull Request with your changes, I will take a careful look at it! :100:

I see two ways to achieve this:

Build a script to update the emojiKeywords.js and emojiCharacters.js (in Javascript type file)
Build a script that parses and update a Json file, and stop using Js files - this solution also needs to update the extension code to read from the emojis Json file.

Both should work fine! What do you think? Do you have any other solution, or may follow one of these?

Thank you so much for your contribution with this issue, @pavinjosdev!

pavinjosdev commented 7 months ago

@FelipeFTN Thank you for the update. I think option [2] using the JSON file is better as it's normally used for storing data and we can make the parser in any language that natively supports JSON. I will submit the PR soon 🙂

NatVIII commented 7 months ago

Actually, I know it's quite late but I wanted to contribute that other more featureful datasets are available which also include needed keywords. One possible example is pulling from the muan/emojilib repo, with its json available here. There were some issues with that implementation though that we may find on implementing this as well which are

No categorization

The current js file relies on having the emotes be in little categories at the top of the screen and I can't figure out a way to have this consistently be available if we're re-creating the database every unicode update to make sure that it's compliant with the latest standard

ZWJ Integration

I don't see any way in the code that we can use ZWJ sequences yet. They're more important in emojis implemented all the way up into 15.1 and I just never got a chance to figure that out.

Just my two cents, sorry I was never able to get around to this so far!

pavinjosdev commented 7 months ago

@NatVIII The current parser as implemented by @FelipeFTN uses the categories from the unicode test file to automatically categorize emojis into groups and subgroups such as Smileys & Emotion, People & Body, etc. I believe ZWJ emojis are included in there.

The whole thing is saved as an SQLite DB by the python parser, which is queried by the extension's JS. The blocker currently is w.r.t. using Gnome's libgda library and its corresponding SQLite binding causing gnome-shell to crash on OpenSuse Tumbleweed system running the latest of everything. It works on Fedora/Arch so I doubt it's an issue with the extension code itself. @FelipeFTN is awesome, he did all of the SQL work to make the queries fast 🚀

NatVIII commented 7 months ago

😮 I had no idea that much work had gone in, as cool as this project is and as much as I use it daily I haven't really had a chance to delve into the code too much. Good work @FelipeFTN

FelipeFTN commented 7 months ago

Hahahha Thank you so much @NatVIII @pavinjosdev :heart: Actually @pavinjosdev did all the hard work hahaha!

The new feature is almost ready! Let's keep working! :people_hugging: