Add words of different language to vaderSentiment

Rishav09 commented 6 years ago

I am trying to write multiple words of Hindi language to vader Sentiment using this

   analyzer=SentimentIntensityAnalyzer()
   new_words={
                          'ग़लतापना':  -2.0,
                          'एकता_का_अभाव':  3.4,
                }
   analyzer.lexicon.update(new_words)

But it is not correctly predicting the new words.

darthvader2 commented 5 years ago

anyone working on this issue?

cjhutto commented 4 years ago

I note @Hiestaa 's excellent response in another issue about guidance for adding new words to the lexicon, and emphasize that the lexicon file is TAB separated (so adding new words and valence score with space-only separations won't work).

The README provides a description of the values in the lexicon.

The vader_lexicon.txt holds the following TAB SEPARATED format:

Token	Valence	Standard Deviation	Human Ratings
(:<	`-0.2`	`2.03961`	`[-2, -3, 1, 1, 2, -1, 2, 1, -4, 1]`
amorphous	`-0.2`	`0.4`	`[0, 0, 0, 0, 0, 0, -1, 0, 0, -1]`

If you want to follow the same rigorous process as the author of the study, you should find 10 independent humans to evaluate each word you want to add to the lexicon, make sure the standard deviation doesn't exceed 2.5, and take the average rating for the valence. This will keep the file consistent.

Now if you just want to make the algorithm work on these new cases quickly, the standard deviation and human ratings are indeed not necessary. Only the token and valences are used.

Originally posted by @Hiestaa in https://github.com/cjhutto/vaderSentiment/issues/28#issuecomment-329506640

cjhutto / vaderSentiment

Add words of different language to vaderSentiment #59