cjhutto / vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
MIT License
4.43k stars 1k forks source link

incorrect sentiment due to "!" #60

Closed SundareshPrasanna closed 4 years ago

SundareshPrasanna commented 6 years ago

I tried the following examples:

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyser = SentimentIntensityAnalyzer()

analyser.polarity_scores("This is so bad") {'compound': -0.6696, 'neg': 0.6, 'neu': 0.4, 'pos': 0.0} -- Correct sentiment

But when i add 4 excamations ("!!!!"), the sentence comes out as Neutral.

analyser.polarity_scores("This is so bad!!!!!") {'compound': 0.0, 'neg': 0.0, 'neu': 1.0, 'pos': 0.0}

Addition of multiple exclamations has created problems in this case. I tested for upto 6 exclamations & the breaking point seems to be 4. The sentiment works well till 3 exclamations in the sentence (atleast for this particular example)

Can someone help me with this?

curtisdf commented 5 years ago

I just ran into this, and this seems to be from not one but two bugs on top of each other. First, the code's tokenization routine for trailing punctuation only considers sequences of one, two, or three exclamation points, but it ignores anything with 4 or more. Second, it also shouldn't be considering your second sentence as neutral since "bad" is a sentiment-laden word in the lexicon (valence is -2.5). So even if it was ignoring the punctuation marks, the scores should at least come out to be the same as your first sentence.

Unfortunately the maintainers don't appear to be responding to pull requests or issue reports. I see several things outstanding, some of which are quite simple to fix. My own work is in PHP though, so I'm just porting the Python. So I don't have any one-off fixes to offer you. My apologies.

ddugovic commented 5 years ago

I guess I personally don't infer any meaning from "!!!!" (anyone using that much or more punctuation might not be thinking clearly) so I'd expect the same score in either test case.

chaec0803 commented 5 years ago

The problem is that they don't separate bad from "!!!!" within the program, because it is not included in their PUNC_LIST, so that the token gets read as "bad!!!!", which is not contained in the lexicon... so bad is not included when evaluating the valence.

cjhutto commented 4 years ago

Got this corrected in the master of this repo. Thanks for pointing it out!

In my local version, this is what I'm seeing:

from vaderSentiment import SentimentIntensityAnalyzer
vader = SentimentIntensityAnalyzer()
sentences = ["This is so bad",
             "This is so bad!",
             "This is so bad!!",
             "This is so bad!!!",
             "This is so bad!!!!",
             "This is so bad!!!!!",
             "This is so bad!!!!!!"
            ]
for sentence in sentences:
    vs = vader.polarity_scores(sentence)
    print("{:-<25} {}".format(sentence, str(vs)))

outputs:

This is so bad----------- {'neg': 0.6, 'neu': 0.4, 'pos': 0.0, 'compound': -0.6696}
This is so bad!---------- {'neg': 0.615, 'neu': 0.385, 'pos': 0.0, 'compound': -0.6988}
This is so bad!!--------- {'neg': 0.628, 'neu': 0.372, 'pos': 0.0, 'compound': -0.7249}
This is so bad!!!-------- {'neg': 0.641, 'neu': 0.359, 'pos': 0.0, 'compound': -0.7482}
This is so bad!!!!------- {'neg': 0.654, 'neu': 0.346, 'pos': 0.0, 'compound': -0.769}
This is so bad!!!!!------ {'neg': 0.654, 'neu': 0.346, 'pos': 0.0, 'compound': -0.769}
This is so bad!!!!!!----- {'neg': 0.654, 'neu': 0.346, 'pos': 0.0, 'compound': -0.769}