cjhutto / vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
MIT License
4.46k stars 1k forks source link

Emoticons UTF-8 #22

Closed kokojumbo closed 6 years ago

kokojumbo commented 7 years ago

First of all, I really like your lib. It's powerful. I used utf-8 twitter emoticons like ❤️😂😫😊 and all of them have neutral sentiment. Maybe it is just my issue, but I tried using UTF-8 encoding in my code and it didn't help. I think, it's a good idea to add them to vader_lexicon.txt and support UTF-8 emoticons.

Nowadays, they are more popular in social media than standard emoticons (':)' , ':(' ,':D', etc.) Please check this website: http://unicode.org/emoji/charts/full-emoji-list.html I think, it might increase VADER accuracy significantly.

Do you plan to rewrite this code to Java to make it more popular? I can help you with that.

cjhutto commented 7 years ago

I'm glad you're finding VADER to be useful.

I think adding the emoji list is a great idea - I'll work on getting validated sentiment scores and add them to the lexicon.

A previous version of VADER was ported to JAVA. It can be found at https://github.com/apanimesh061/VaderSentimentJava.

kootenpv commented 7 years ago

I would like to second (or third) it.

I found these: 😡😤 not to take any effect. I would probably categorise them as ":@" in the vader_lexicon.

Do you have any ETA for the updated lexicon :)? I think it would really help!

knil-sama commented 7 years ago

I am currently using this library emoji to handling unicode emotes. If Vader accepts to rely on third party library I can work on a pull request for this.

cjhutto commented 6 years ago

@kokojumbo @kootenpv @knil-sama A year and a day later, I got around to implementing support for emojis :) use > pip install --upgrade vaderSentiment to give it a try!