Closed LouisK130 closed 4 years ago
Facing the same issue. I'm not sure if we should pre-process our text to contain < 4 exclamations & remove hashtag symbol.
What @SundareshPrasanna mentioned is the only way to handle these cases. The code is not able to handle these examples correctly without pre-processing.
Got it fixed in the master version on this repo. Thanks for pointing it out. Based on empirical evidence reported in the paper, (perceived) sentiment intensity plateaus at ~4 exclamations... so the correct behavior has been re-incorporated into the VADER evaluation engine. We've also enhanced with the #-word capability.
What I now see is:
from vaderSentiment import SentimentIntensityAnalyzer
vader = SentimentIntensityAnalyzer()
sentences = ["This is so bad",
"This is so bad!!!",
"This is so bad!!!!",
"This is so bad!!!!!",
"This is so bad!!!!!!",
"Awesome",
"#awesome",
"scary",
"#scary"
]
for sentence in sentences:
vs = vader.polarity_scores(sentence)
print("{:-<25} {}".format(sentence, str(vs)))
outputs:
This is so bad----------- {'neg': 0.6, 'neu': 0.4, 'pos': 0.0, 'compound': -0.6696}
This is so bad!!!-------- {'neg': 0.641, 'neu': 0.359, 'pos': 0.0, 'compound': -0.7482}
This is so bad!!!!------- {'neg': 0.654, 'neu': 0.346, 'pos': 0.0, 'compound': -0.769}
This is so bad!!!!!------ {'neg': 0.654, 'neu': 0.346, 'pos': 0.0, 'compound': -0.769}
This is so bad!!!!!!----- {'neg': 0.654, 'neu': 0.346, 'pos': 0.0, 'compound': -0.769}
Awesome------------------ {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.6249}
#awesome----------------- {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.6249}
scary-------------------- {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.4939}
#scary------------------- {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.4939}
scary
has a negative compound score, but#scary
has a compound score of 0. This doesn't seem like the right behavior for a tool geared towards analysis of texts from social media.awesome!
has a positive compound score,awesome!!
is even more positive,awesome!!!
is still more positive, and then suddenlyawesome!!!!
has a compound score of 0. Again, this doesn't seem appropriate.