cjhutto / vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
MIT License
4.43k stars 1k forks source link

Hashtags and Excessive Punctuation Fail #55

Closed LouisK130 closed 4 years ago

LouisK130 commented 6 years ago

scary has a negative compound score, but #scary has a compound score of 0. This doesn't seem like the right behavior for a tool geared towards analysis of texts from social media.

awesome! has a positive compound score, awesome!! is even more positive, awesome!!! is still more positive, and then suddenly awesome!!!! has a compound score of 0. Again, this doesn't seem appropriate.

SundareshPrasanna commented 6 years ago

Facing the same issue. I'm not sure if we should pre-process our text to contain < 4 exclamations & remove hashtag symbol.

AbtinZo commented 5 years ago

What @SundareshPrasanna mentioned is the only way to handle these cases. The code is not able to handle these examples correctly without pre-processing.

cjhutto commented 4 years ago

Got it fixed in the master version on this repo. Thanks for pointing it out. Based on empirical evidence reported in the paper, (perceived) sentiment intensity plateaus at ~4 exclamations... so the correct behavior has been re-incorporated into the VADER evaluation engine. We've also enhanced with the #-word capability.

What I now see is:

from vaderSentiment import SentimentIntensityAnalyzer
vader = SentimentIntensityAnalyzer()

sentences = ["This is so bad",
             "This is so bad!!!",
             "This is so bad!!!!",
             "This is so bad!!!!!",
             "This is so bad!!!!!!",
             "Awesome",
             "#awesome",
             "scary",
             "#scary"
            ]
for sentence in sentences:
    vs = vader.polarity_scores(sentence)
    print("{:-<25} {}".format(sentence, str(vs)))

outputs:

This is so bad----------- {'neg': 0.6, 'neu': 0.4, 'pos': 0.0, 'compound': -0.6696}
This is so bad!!!-------- {'neg': 0.641, 'neu': 0.359, 'pos': 0.0, 'compound': -0.7482}
This is so bad!!!!------- {'neg': 0.654, 'neu': 0.346, 'pos': 0.0, 'compound': -0.769}
This is so bad!!!!!------ {'neg': 0.654, 'neu': 0.346, 'pos': 0.0, 'compound': -0.769}
This is so bad!!!!!!----- {'neg': 0.654, 'neu': 0.346, 'pos': 0.0, 'compound': -0.769}
Awesome------------------ {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.6249}
#awesome----------------- {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.6249}
scary-------------------- {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.4939}
#scary------------------- {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.4939}