Open clarencejychan opened 4 years ago
This might be a good place to start the research @PanTheMan. Please note your findings in the document and be able to plan out what steps we need to take in order to improve our model.
Vader Reading:
A comprehensive, high quality lexicon is often essential for fast, accurate sentiment analysis on such large scales. (E.g. LIWC, VADER)
Vader is free to use!
Vader performs better in social media contexts
Apparently in classification accuracy for social media, it outperforms humans? (0.96 vs 0.84) ADDITIONAL READING: "It is not our intention to review the entire body of literature concerning sentiment analysis. Indeed, such an endeavor would not be possible within the limited space available (such treatments are available in Liu (2012) and Pang & Lee (2008))."
Most Sentiment analysis rely greatly on a sentiment lexicon (list of words which are generally labelled positive, neutral, or negative in a context free situation)
Paper talks about the 3 widely used lexicons and shits on them
Can capture sentiment from emoticons apparently
Another option instead of vader is SenticNet
BLAH BLAH BLAH talks about it measuring its dick to other known lexicons or sentiment score calculators and also vs humans
Seems like our best bet is to use Vader for now. It's simple, requires no dedicated CPU to train an actual model.
An improvement from looking online though: Take any comment and tokenize into sentences and perform an average calculation. I would say it gets funky when it tries to look at multiple sentences at the same time.
Most likely need to follow the VADER project found here. http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf
A good idea to read through (or at the very least skim through) to understand what a good methodology is to understand sentiment analysis.
AC: A rough document describing what steps we need to do in order to create a decent model for our purposes.