cjhutto / vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
MIT License
4.43k stars 1k forks source link

issue in using "no" as negation #72

Closed vijaydone closed 5 years ago

vijaydone commented 5 years ago

i see phrases like "no problem", "no complaints" getting tagged as negative in compound score. Any specific reason for not adding "no" to negation list ?

t5k6 commented 5 years ago

I've just started using NLP packages, vaderSentiment being the first one, for a very simple Discord bot. And I've yet to read the documentation thoroughly, but from my fist impressions, I have to say that handling negations is quite hard.

analyser.polarity_scores("everybody loves her")
Out[33]: {'neg': 0.0, 'neu': 0.351, 'pos': 0.649, 'compound': 0.5719}
analyser.polarity_scores("nobody loves her")
Out[34]: {'neg': 0.0, 'neu': 0.351, 'pos': 0.649, 'compound': 0.5719}

Above is normal as "nobody" is not in the "vader_lexicon.txt" file. If I add it manually and give the same score for "no", I get below results

analyser.polarity_scores("everybody loves her")
Out[36]: {'neg': 0.0, 'neu': 0.351, 'pos': 0.649, 'compound': 0.5719}
analyser.polarity_scores("nobody loves her")
Out[37]: {'neg': 0.319, 'neu': 0.145, 'pos': 0.536, 'compound': 0.3612}

This suggests that checking only the compound score would be misleading, and you should also look for neg and pos scores.

JaimeBadiola commented 5 years ago

@t5k6 Adding 'Nobody' to the vader_lexicon.txt file will not be enough as nobody is a negation has to be treated diferently. Try to add it to VaderSentiment.py line 38. That should give you a negative compound score.

image

The problem about adding it there is that it will treat all the similar cases equally. I can't think of any other case where nobody shouldn't be treated negatively, but maybe you can.

cjhutto commented 5 years ago

Provided a fix for checking whether "no" is being used as negation for an adjacent lexicon item vs "no" as its own stand-alone lexicon item.

Output for samples now look like this:

Everything has been smooth. No problems or complaints.  
                                    {'neg': 0.0, 'neu': 0.571, 'pos': 0.429, 'compound': 0.5448}
no problems ever                    {'neg': 0.0, 'neu': 0.47, 'pos': 0.53, 'compound': 0.3089}
No problem as of yet                {'neg': 0.0, 'neu': 0.639, 'pos': 0.361, 'compound': 0.3089}
Doing just fine no problems         {'neg': 0.0, 'neu': 0.425, 'pos': 0.575, 'compound': 0.4692}
All good. No complaints.            {'neg': 0.0, 'neu': 0.279, 'pos': 0.721, 'compound': 0.6319}
No problem everything good          {'neg': 0.0, 'neu': 0.279, 'pos': 0.721, 'compound': 0.6319}
Very satisfied. No bad experiences. {'neg': 0.0, 'neu': 0.211, 'pos': 0.789, 'compound': 0.7569}
No problem.                         {'neg': 0.0, 'neu': 0.307, 'pos': 0.693, 'compound': 0.3089}
No complaints                       {'neg': 0.0, 'neu': 0.307, 'pos': 0.693, 'compound': 0.3089}
No worries.                         {'neg': 0.0, 'neu': 0.3, 'pos': 0.7, 'compound': 0.3252}
No good                             {'neg': 0.706, 'neu': 0.294, 'pos': 0.0, 'compound': -0.3412}
No smiles.                          {'neg': 0.719, 'neu': 0.281, 'pos': 0.0, 'compound': -0.3724}
No laughter.                        {'neg': 0.724, 'neu': 0.276, 'pos': 0.0, 'compound': -0.3875}