cjhutto / vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
MIT License
4.43k stars 1k forks source link

Download additional DATASETS AND TESTING RESOURCES mentioned in README #139

Open Deepankar-98 opened 2 years ago

Deepankar-98 commented 2 years ago

From where can I download the additional DATASETS AND TESTING RESOURCES (items 4-12): mentioned in the README file? https://github.com/cjhutto/vaderSentiment#resources-and-dataset-descriptions

image

I tried to download the resources using nltk.download('name') but it didn't work the mentioned file names are not there in NLTK Corpura (https://www.nltk.org/nltk_data/)

I am trying to download:

  1. tweets_anonDataRatings.txt,
  2. amazonReviewSnippets_anonDataRatings.txt, etc

Can someone help me with this?

cjhutto commented 2 years ago

Check out the "additional_resources" directory in this repo. The complete set of resources is compressed into the .tar.gz file for your convenience.

Deepankar-98 commented 2 years ago

Thanks a lot for the info and the wonderful package.

Deepankar-98 commented 2 years ago

Hi @cjhutto,

I downloaded the additional datasets but I am unable to figure out how to use it. I figured that I can select the file to access using this code:

from nltk.sentiment.vader import SentimentIntensityAnalyzer sid_mod = SentimentIntensityAnalyzer (lexicon_file="vader_lexicon download path")

The content inside vader_lexicon.txt is of the form:

image

Whereas tweets_annonDataRatings.txt is:

image

And tweets_GroundTruth.txt is:

image

This 2 appear to be just dataset and rating of 20 people. I have 2 questions:

  1. The mean valence between the 2 files are different. Can you please clarify on that?
  2. Is there any way I can use this for sentiment analysis? If Yes then how?

Your help is much appreciated.