Very different compound scores for similar emojis

nathan-smit commented 4 years ago

Hey there,

I've just started using this sentiment analysis tool and it's great! I came across a weird case though where I noticed that one of the lowest compound scores was the below tweet:

"Thank you I’m a happy client💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗💓💗"

Upon investigating further it seems that this heart 💗 has a positive compound score whereas this one 💓 has a very negative score leading to a low overall composite score. Any reason for the inconsistent scoring between these fairly similar images? In my application I'm not converting the one heart to the other which leads to this being the most positive tweet in my dataset.

cjhutto commented 4 years ago

Oh, wow -- great find!
So, VADER's ability to score the sentiment of each emoji is accomplished by converting the emoji to it's official textual description, and then just processing that text as normal... this allows me to easily keep the emoji list up-to-date with the most modern set by simply scraping the official data source (here) whenever we need to update.

A quick informal inspection shows me that the first heart is a "growing heart" (strong positive sentiment) and the second one is a "beating heart"... and the context-free interpretation of "beating" is not at all positive (e.g., as in "this particular context-free interpretation is taking a beating in terms of sentiment accuracy").

The quick fix is to add "beating heart" as a special case so that this emoji is correctly interpreted to be positive... see my update on line 79 of the vaderSentiment.py script.

udaykumar1506 commented 4 years ago

Hey,

Is this issue fixed, I am getting Negative score for "Beating Heart" & Neutral Score for "Revolving Hearts"

Please help me here, thank you.

cjhutto commented 4 years ago

I've just pushed the updated version to PIP. You should be able to now pip install --upgrade vaderSentiment or python -m pip install vaderSentiment --no-cache-dir

cjhutto / vaderSentiment

Very different compound scores for similar emojis #94