cjhutto / vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
MIT License
4.38k stars 1k forks source link

Dictionary contains phrases like "fed up" that will never hit because of how the sentence is tokenized #124

Open kirsten-stallings opened 3 years ago

kirsten-stallings commented 3 years ago

The dictionary contains phrases like "fed up" but since the code checks if words are in the dictionary on a word by word basis, these phrases never hit:

> from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
>>> analyzer=SentimentIntensityAnalyzer()
>>> analyzer.polarity_scores("I am fed up")
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
>>>
ViennaMike commented 1 year ago

If I understand the code correctly, "fed up" (or any other multi-word phrases) should be removed from the lexicon.txt file and instead be added to the SENTIMENT_LADEN_IDIOMS, but the actual code for handling this seems to be a placeholder for a future addition.

I found a work-around for handling bigrams (2-word phrases) on Stack Overflow: https://stackoverflow.com/questions/67798527/nltk-vader-sentimentintensityanalyzer-bigram