cjhutto / vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
MIT License
4.43k stars 1k forks source link

Add words of different language to vaderSentiment #59

Closed Rishav09 closed 4 years ago

Rishav09 commented 6 years ago

I am trying to write multiple words of Hindi language to vader Sentiment using this

   analyzer=SentimentIntensityAnalyzer()
   new_words={
                          'ग़लतापना':  -2.0,
                          'एकता_का_अभाव':  3.4,
                }
   analyzer.lexicon.update(new_words)

But it is not correctly predicting the new words.

darthvader2 commented 5 years ago

anyone working on this issue?

cjhutto commented 4 years ago

I note @Hiestaa 's excellent response in another issue about guidance for adding new words to the lexicon, and emphasize that the lexicon file is TAB separated (so adding new words and valence score with space-only separations won't work).

The README provides a description of the values in the lexicon.

The vader_lexicon.txt holds the following TAB SEPARATED format:

Token Valence Standard Deviation Human Ratings
(:< -0.2 2.03961 [-2, -3, 1, 1, 2, -1, 2, 1, -4, 1]
amorphous -0.2 0.4 [0, 0, 0, 0, 0, 0, -1, 0, 0, -1]

If you want to follow the same rigorous process as the author of the study, you should find 10 independent humans to evaluate each word you want to add to the lexicon, make sure the standard deviation doesn't exceed 2.5, and take the average rating for the valence. This will keep the file consistent.

Now if you just want to make the algorithm work on these new cases quickly, the standard deviation and human ratings are indeed not necessary. Only the token and valences are used.

Originally posted by @Hiestaa in https://github.com/cjhutto/vaderSentiment/issues/28#issuecomment-329506640