Closed vijaydone closed 5 years ago
I've just started using NLP packages, vaderSentiment being the first one, for a very simple Discord bot. And I've yet to read the documentation thoroughly, but from my fist impressions, I have to say that handling negations is quite hard.
analyser.polarity_scores("everybody loves her")
Out[33]: {'neg': 0.0, 'neu': 0.351, 'pos': 0.649, 'compound': 0.5719}
analyser.polarity_scores("nobody loves her")
Out[34]: {'neg': 0.0, 'neu': 0.351, 'pos': 0.649, 'compound': 0.5719}
Above is normal as "nobody" is not in the "vader_lexicon.txt" file. If I add it manually and give the same score for "no", I get below results
analyser.polarity_scores("everybody loves her")
Out[36]: {'neg': 0.0, 'neu': 0.351, 'pos': 0.649, 'compound': 0.5719}
analyser.polarity_scores("nobody loves her")
Out[37]: {'neg': 0.319, 'neu': 0.145, 'pos': 0.536, 'compound': 0.3612}
This suggests that checking only the compound score would be misleading, and you should also look for neg and pos scores.
@t5k6 Adding 'Nobody' to the vader_lexicon.txt file will not be enough as nobody is a negation has to be treated diferently. Try to add it to VaderSentiment.py line 38. That should give you a negative compound score.
The problem about adding it there is that it will treat all the similar cases equally. I can't think of any other case where nobody shouldn't be treated negatively, but maybe you can.
Provided a fix for checking whether "no" is being used as negation for an adjacent lexicon item vs "no" as its own stand-alone lexicon item.
Output for samples now look like this:
Everything has been smooth. No problems or complaints.
{'neg': 0.0, 'neu': 0.571, 'pos': 0.429, 'compound': 0.5448}
no problems ever {'neg': 0.0, 'neu': 0.47, 'pos': 0.53, 'compound': 0.3089}
No problem as of yet {'neg': 0.0, 'neu': 0.639, 'pos': 0.361, 'compound': 0.3089}
Doing just fine no problems {'neg': 0.0, 'neu': 0.425, 'pos': 0.575, 'compound': 0.4692}
All good. No complaints. {'neg': 0.0, 'neu': 0.279, 'pos': 0.721, 'compound': 0.6319}
No problem everything good {'neg': 0.0, 'neu': 0.279, 'pos': 0.721, 'compound': 0.6319}
Very satisfied. No bad experiences. {'neg': 0.0, 'neu': 0.211, 'pos': 0.789, 'compound': 0.7569}
No problem. {'neg': 0.0, 'neu': 0.307, 'pos': 0.693, 'compound': 0.3089}
No complaints {'neg': 0.0, 'neu': 0.307, 'pos': 0.693, 'compound': 0.3089}
No worries. {'neg': 0.0, 'neu': 0.3, 'pos': 0.7, 'compound': 0.3252}
No good {'neg': 0.706, 'neu': 0.294, 'pos': 0.0, 'compound': -0.3412}
No smiles. {'neg': 0.719, 'neu': 0.281, 'pos': 0.0, 'compound': -0.3724}
No laughter. {'neg': 0.724, 'neu': 0.276, 'pos': 0.0, 'compound': -0.3875}
i see phrases like "no problem", "no complaints" getting tagged as negative in compound score. Any specific reason for not adding "no" to negation list ?