Closed Rishav09 closed 4 years ago
anyone working on this issue?
I note @Hiestaa 's excellent response in another issue about guidance for adding new words to the lexicon, and emphasize that the lexicon file is TAB separated (so adding new words and valence score with space-only separations won't work).
The README provides a description of the values in the lexicon.
The vader_lexicon.txt holds the following TAB SEPARATED format:
Token | Valence | Standard Deviation | Human Ratings |
---|---|---|---|
(:< | -0.2 |
2.03961 |
[-2, -3, 1, 1, 2, -1, 2, 1, -4, 1] |
amorphous | -0.2 |
0.4 |
[0, 0, 0, 0, 0, 0, -1, 0, 0, -1] |
If you want to follow the same rigorous process as the author of the study, you should find 10 independent humans to evaluate each word you want to add to the lexicon, make sure the standard deviation doesn't exceed 2.5, and take the average rating for the valence. This will keep the file consistent.
Now if you just want to make the algorithm work on these new cases quickly, the standard deviation and human ratings are indeed not necessary. Only the token and valences are used.
Originally posted by @Hiestaa in https://github.com/cjhutto/vaderSentiment/issues/28#issuecomment-329506640
I am trying to write multiple words of Hindi language to vader Sentiment using this
But it is not correctly predicting the new words.