Investigate methodology in order to create a model for sentiment analysis

clarencejychan commented 4 years ago

Most likely need to follow the VADER project found here. http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

A good idea to read through (or at the very least skim through) to understand what a good methodology is to understand sentiment analysis.

AC: A rough document describing what steps we need to do in order to create a decent model for our purposes.

clarencejychan commented 4 years ago

This might be a good place to start the research @PanTheMan. Please note your findings in the document and be able to plan out what steps we need to take in order to improve our model.

PanTheMan commented 4 years ago

Vader Reading:

A comprehensive, high quality lexicon is often essential for fast, accurate sentiment analysis on such large scales. (E.g. LIWC, VADER)
Vader is free to use!
Vader performs better in social media contexts
Apparently in classification accuracy for social media, it outperforms humans? (0.96 vs 0.84) ADDITIONAL READING: "It is not our intention to review the entire body of literature concerning sentiment analysis. Indeed, such an endeavor would not be possible within the limited space available (such treatments are available in Liu (2012) and Pang & Lee (2008))."
Most Sentiment analysis rely greatly on a sentiment lexicon (list of words which are generally labelled positive, neutral, or negative in a context free situation)
Paper talks about the 3 widely used lexicons and shits on them
Can capture sentiment from emoticons apparently

Another option instead of vader is SenticNet

We access the SenticNet polarity score using the online SenticNet API and a publicly available Python package (not much hope on this from the results though but maybe worth investigating) -!!!!!!!!!!!! To improve vader, we can use word-sense disambiguation. That means the process of identifying which sense of a word is used in a sentence when the word has multiple meanings (i.e. its contextual meaning). IE: To distinguish negative sentiment in “At first glance the contract looks good, but there’s a catch”, but is neutral in “The fisherman plans to sell his catch at the market” I DON"T KNOW IF VADER NLTK ALREADY DOES THIS

BLAH BLAH BLAH talks about it measuring its dick to other known lexicons or sentiment score calculators and also vs humans

Vader performed well on Social Media Texts, beating out humans but wasn't good at product reviews, movie reviews, NY times, etc (around 50% accuracy)
Also worth noting that Hu-Liu another lexicon performed similarly to vader though apparently doesn't capture sentiments from emoticons and etc

Seems like our best bet is to use Vader for now. It's simple, requires no dedicated CPU to train an actual model.

An improvement from looking online though: Take any comment and tokenize into sentences and perform an average calculation. I would say it gets funky when it tries to look at multiple sentences at the same time.

clarencejychan / nephew-pipeline

Investigate methodology in order to create a model for sentiment analysis #5