CRMTH / RITHM

MIT License
2 stars 4 forks source link

Handle Unicode emoji variants #8

Open colditzjb opened 7 years ago

colditzjb commented 7 years ago

Unicode variants of the //ufeo* type are not being recoded in the parser (decode.py). We may be able to ignore these as they are context-dependent and add little or no utility for classification purposes.

See this link: https://stackoverflow.com/questions/38100329/some-emojis-e-g-have-two-unicode-u-u2601-and-u-u2601-ufe0f-what-does

colditzjb commented 7 years ago

Check out emojitracker's list of known emoji: https://github.com/mroth/emoji_data.rb/blob/master/vendor/emoji-data/emoji.json

colditzjb commented 6 years ago

After some group discussion, a few Unicode variants may be potentially valuable for continued research (e.g., Fitzpatrick variants are potentially interesting, when available). This Unicode issue is an ongoing topic of discussion.

colditzjb commented 6 years ago

@sanyabt - I think this should just be a simple update to the emojilist.csv file. Should we ask one of our RA's to do this? If so, is there a list of important emoji or symbols that we're not currently capturing? (Don't worry about foreign language Unicode characters though.)