PythEsc / Research_project2

Prediction of Facebook-user reactions to supermarkets using neural networks
1 stars 0 forks source link

Emotion Mining and Sentiment Analysis #18

Closed Naxter closed 7 years ago

Naxter commented 7 years ago

As we said in the first phase presentation we wanted to do sentiment analysis. In addition to that we decided to do emotion mining also.

Emotion Mining is done with the EmoLex (NER) dictionary and the Sentiment Analysis (first approach) is done with the Stanford CoreNLP library. We need to figure out if this approach is sufficient for this dataset. Otherwise we need to watch out for an other sentiment library that is trained for "slang", etc.

Naxter commented 7 years ago

Both is implemented. Actual version only uses the data of the database which is "just" the new crawled data. We now need to import the dataset to the database (in implementation).

This will result also in easier statistic creation, easy filtering without much code.

We can then also run experiments easier.

Naxter commented 7 years ago

Parser that parses the csv files to the Database is also implemented.

So now we can calculate sentiment and emotion for all the posts and comments.

The approach is: (done for posts and also for the comments and both saved in the post table) Emotion Mining -> calculate the sum of the emotions for the words that can be found in the dictionary. This is how we can find a vector with the distribution of the emotions. Sentiment Analysis -> use the CoreNLP for sentiment calculation. CoreNLP calculates the sentiment for each sentence. This is also summed up to get a end result. (Negative, Verynegative, positive, Verypositive, neutral). CoreNLP does "annotators, tokenize, ssplit, pos, parse, sentiment" to calculate the sentiment.

jerryspan commented 7 years ago

Perhaps a different lexicon that can be used (instead of Emolex) is this one from Stanford which comes from Reddit.

https://nlp.stanford.edu/projects/socialsent/

Might be more relevant to facebook posts.

Naxter commented 7 years ago

Yeah, will try it with this one. Sounds good

Naxter commented 7 years ago

Using this approach: Exploiting a Bootstrapping Approach for Automatic Annotation of Emotions in Texts

Naxter commented 7 years ago

For adding negation handling in emotion detection, one need to find out how a negative "word" can influence the outcome of the emotion of a sentence.

Does it invert the complete emotions? Does it account for a negative or positive emotion? Or even for all negative emotions/positive emotions?

jerryspan commented 7 years ago

You can look up literature on this.

e.g. from this paper (although I wouldn't trust it much)

(non exhaustive list of negations) no, not, rather, couldn’t, wasn’t, didn’t, wouldn’t, shouldn’t, weren’t, don’t, doesn’t, haven’t, hasn’t, won’t, wont, hadn’t, never, none, nobody, nothing, neither, nor, nowhere, isn’t, can’t, cannot, mustn’t, mightn’t, shan’t, without, needn’t,

(diminisher) hardly, less, little, rarely, scarcely, seldom

I think that most sophisticated negation handling algorithms require dependency parsing (to actually check the range of the negation and which words it affects), but since it's short text (posts/comments) in our case, should be also ok to include some rules, e.g.

Negation + pos. word -> (somehow increase negative sentiment) Negation + neg. word -> (somehow increase positive sentiment)

Naxter commented 7 years ago

Thanks!

Naxter commented 7 years ago

I am considering to use POS-tags into account for the "rules" of negation handling, to also get things like "not very happy". Otherwise things like "really, very, ..." are ignored and will "crash" the negation handling.

Naxter commented 7 years ago

For the simple sentence similarity, I was wondering how to get the best results. And actually found an interesting approach (a recent one): https://openreview.net/pdf?id=SyK00v5xx with a simple implementation found here: https://github.com/peter3125/sentence2vec This approach sounds really promising. I think that I will try this one! found here (last answer): https://stackoverflow.com/questions/22129943/how-to-calculate-the-sentence-similarity-using-word2vec-model-of-gensim-with-pyt All this is used to extend the amount of annotated sentences. (non-annotated sentences are annotated when similarity is > 0.8)

If there are still non-annotated sentences:

I am going to use a OneVsRestClassifier with a Linear SVM and multilabel classification to annotate the sentences that does not contain a single word from the emotion lexicon. I hope the recall/precision of the system is going to be at least >70%. Otherwise this will not be a good model. Also, I am going to use TF-IDF values and not WEKA and suggested in the paper. (TF-IDF is still the best approach)

jerryspan commented 7 years ago

Indeed, this paper (of the ICLR) is a very interesting approach. It's also kind of new (2017), so hasn't been widely applied yet. My point is that if it becomes too complex, then just go with averaging word vectors.

PythEsc commented 7 years ago

Emotion database

Tobias and me extended the EmoLex by using WordNet synonyms. The synonyms have been integrated into the database by using the same emotion vector as the original looked-up word. The database has increased from 14181 to 31485

Emotion miner

We also extended the current emotion miner such that it uses simple negation handling. We are using a list of negation pre- and suffixes. Prefixes: ["a", "de", "dis", "il", "im", "in", "ir", "mis", "non", "un"] Suffixes: ["less"]

Negation handling: 1st rule

The first rule is used when a negation word is instantly followed by the emotion-word (Word that is present in our emotion database).

Negation handling: 2nd rule

The second rule tries to handle adverbs and past particle verbs (pos-tags: RB, VBN). If a negation word is followed by one or more of these pos-tags and a following emotion-word, the emotion-word will still be negated.

Negation handling: Calculation

There are two ways how we can obtain the emotions of a negated word

  1. Lookup all combinations of negation pre- and suffixes together with the word in our emotion database
  2. If no combination in 1.) could be found, we are going to use a manually created mapping between the emotions and their negations. The mapping can be seen in the table below
Anger Anticipation Disgust Fear Joy Sadness Surprise Trust
Anger 0 0 0 0 1 0 0 0
Anticipation 0 0 0 1 0 0 1 0
Disgust 0 0 0 0 1 0 0 1
Fear 0 0 0 0 1 0 0 1
Joy 1 0 1 1 0 1 0 0
Sadness 0 0 0 0 1 0 0 0
Surprise 0 1 0 0 0 0 0 1
Trust 0 0 1 0 0 0 1 0

Example: Emotion of a word: [0,0,1,1,0,1,0,0] New emotion after negation of "Disgust": [0,0,0,0,0.5,0,0,0.5] New emotion after negation of "Fear": [0,0,0,0,1,0,0,1] New emotion after negation of "Sadness": [0,0,0,0,2,0,0,1]

Sentence similarity measure

Moreover, we've added the Sentence2Vec code mentioned in the bootstrapping paper together with an averaging word vector approach for comparison. Both approaches return similar similarity scores. The problem that we've encountered is that two sentences with different emotions but same structure are measured as nearby.

Example: Sentence 1: "I really love your car." Sentence 2: "I really hate your car." Sentence2Vec similarity: 0.9278 Avg vector similarity: 0.9269

This high similarity is problematic since the emotions of the two sentences are completely different. One can see that the two models are really equal and for now we cannot see any advantage of the sentence2vec approach over the simple average vector approach.

SVM

Furthermore, we've added a SVM-implementation. This is used to annotate all sentences that couldn't been tagged by the emotion miner. It uses the sklearn multilabel OneVsRestClassifier with a LinearSVM taking tf-idf values as input. The input consists of a single sentence as data and an array of 8 values representing the emotions as label. With a training-split of 95/5, we currently get an average precision recall of about 0.93 not using the similarity scores.

svm_pre_rec

Naxter commented 7 years ago

When the neural networks are ready, they can be combined with the results of emotion mining. For example with a linear regression or something similar.

Naxter commented 7 years ago

I am going to tag the rest of the non-annotated sentences with the SVM now and save the in the database and also use that to finally save the emotion distribution for a post (from the comments)

Naxter commented 7 years ago

As we mined all the emotions now and sentiments, this task is done.