Closed PythEsc closed 7 years ago
For this task, there would be a mapping between emotions->reactions needed and also reactions->sentiment
I've just ran some code that calculates the correlation between the results of our emotion mining and the actual users' reactions.
Average absolute difference
ANGER | ANTICIPATION | DISGUST | FEAR | JOY | SADNESS | SURPRISE | TRUST | |
---|---|---|---|---|---|---|---|---|
LOVE | 0.881 | 0.609 | 0.962 | 0.890 | 0.600 | 0.749 | 0.696 | 0.616 |
WOW | 0.180 | 0.384 | 0.185 | 0.199 | 0.359 | 0.206 | 0.236 | 0.380 |
HAHA | 0.815 | 1.084 | 0.915 | 0.826 | 1.010 | 0.788 | 0.851 | 1.113 |
SAD | 0.247 | 0.789 | 0.242 | 0.293 | 0.723 | 0.346 | 0.469 | 0.757 |
ANGRY | 0.636 | 1.007 | 0.657 | 0.674 | 1.028 | 0.720 | 0.890 | 0.977 |
Unfortunately, one cannot see any good correlation between the emotion mining results and the actual reactions... Either there is still a bug in my code or the emotion mining and the reactions really do not have any correlation.
I've just added the "help wanted" flag because I'd like somebody to do a review of the "emotion_analysis.py" file. Want to be sure that those results are not just some dumb bug ;) When whoever has done the review you can write your comments here and remove the flag again
EDIT: We could try to find correlation between the emotions and the reaction using the other emotion-lexicon that Jerry mentioned. If this still does not help and we cannot find a bug in my evaluation code we have to evaluate (somehow) the precision of our emotion miner. In my opinion there should be a visible correlation between the reactions and emotions, but since it is not visible in the current statistic there are three reasons for that:
EDIT2: I've just saw when looking into the lexicon that it is a sentiment and not emotion lexicon. Therefore we cannot improve our emotion miner with that one :(
Well, I am a bit confused about the "average absolute difference" / AAD. What are the pure correlation numbers?
e.g. I guess it's good that "angry" has the lowest AAD with "anger". At the same time, "sad" should have the lowest AAD with "sadness" ?
Would it help to look at some handpicked posts, to see what is going on (and why...)? Could it be because not many words are related to an emotion (or sentiment)? Am not sure if I mentioned it before but there is a recent approach (see attached) that uses pre-trained word embeddings + emotion lexicon to "annotate" a corpus.
Exploiting a Bootstrapping Approach for Automatic Annotation of Emotions in Texts.pdf
Well, I am a bit confused about the "average absolute difference" / AAD. What are the pure correlation numbers? e.g. I guess it's good that "angry" has the lowest AAD with "anger". At the same time, "sad" should have the lowest AAD with "sadness" ?
Yes exactly the lower the value the better the match. When looking at the row: angry "matches" anger since it has the lowest value (which is still quite high) in the row. When looking at the column: WOW matches anger even better (since it has a value of only 0.180) which does not make much sense?
Its "averaged" because I divided the summed difference by the number of differences that lead to that sum. Whenever the difference between the two ratios (emotion_ratio and reaction_ratio) of one post is not zero I increased that counter by one and added the difference to the total difference. At the end I divided that total difference by the value of the counter.
Would it help to look at some handpicked posts, to see what is going on (and why...)? Could it be because not many words are related to an emotion (or sentiment)?
Yes I guess we have to label some data manually to compare the results of our emotion miner with the handcrafted results and calculate some performance score to evaluate the miner. I really don't know where the error is located exactly because there are so many possibilities (wrong results because we do not have negation handling, unbalanced lexicon e.g. maybe there are a lot more words for one emotion than for another, maybe the lexicon is simply wrong or does not work for our domain, maybe my evaluation is wrong, ...)
I have already thought about bootstrapping our labels for the emotions by using a similar approach. The approach of them is similar to our approach. They use CoreNLP and EmoLex. This sounds promising but I wonder: if also taking negation handling in this approach into consideration, the results would get better? I may have to think about that.
But indeed, would be interesting to see if this bootstrapping approach would improve the results.
This sounds promising but I wonder: if also taking negation handling in this approach into consideration, the results would get better?
Well at the moment a sentence like "Your salad is really not delicious" would get a positive sentiment/emotion since delicious is associated with positive emotions/sentiment. I think that would slightly improve the results but I am not sure if that alone is enough. I guess we really have to label some data manually and see if our miner fits those labels.
In general we do not have any information about the validity of our sentiment/emotion analysis since we have no real labeled data. Maybe the results are pretty much random and hence not better than any baseline. I guess the sentiment analysis has a higher accuracy since we are using CoreNLP for that task which does a lot more than a simple lookup in a dictionary.
Isn't "not" yielding some negativity at least?
EmoLex does not cover this kind of words and we did not include any negation handling or the emotion mining.
The Stanford CoreNLP for sentiment analysis https://stanfordnlp.github.io/CoreNLP/sentiment.html#description proposes on this information page:https://nlp.stanford.edu/sentiment/ that this implementation does consider negation handling. But still the result is a negative outcome. Maybe it is really necessary to get in touch with some own implementation/own usage of a negation handler framework to get better results?
Stanforld NLP sentiment analysis is state-of-the-art and due to recursive/dependency parser structure they are using, they are able to handle negations as well.
GIven the nature of social media text, I think you should include somehow negation handling in your (baseline?) model as well.
There should be some literature on this as well: https://www.aclweb.org/anthology/W/W10/W10-3111.pdf http://www.aclweb.org/anthology/W15-2914
and perhaps some "lists": http://ptrckprry.com/course/ssd/data/negative-words.txt
BTW, the StanfordNLP for the phrase "Your salad is really not delicious" concludes that is "neutral".
BTW, the StanfordNLP for the phrase "Your salad is really not delicious" concludes that is "neutral".
Hmm I tested the same and for me it was "NEGATIVE". Since I am not at home at the moment I used the online CoreNLP API
Stanforld NLP sentiment analysis is state-of-the-art and due to recursive/dependency parser structure they are using, they are able to handle negations as well.
Yeah that's what I thought. I guess the sentiment results are fine but our emotion mining is not. Either we'll have to find a library that supports emotion extraction in a pipeline or have to write our own pipeline including negation handling and the additional stuff that we are already doing (e.g. lemmatizing)
I used this one: http://nlp.stanford.edu:8080/sentiment/rntnDemo.html
Ah ok well that's pretty much neutral with a low tendency to negative maybe that's why the site that I used recognizes it as negative.
We could use the MPQA Opinion Corpus for evaluation of our emotion miner. I haven't read the whole description yet but it looks promising
I found out, that the problem with different results in the online demo and the version in the corenlp package might be caused by an outdated trained model in the corenlp package. One guy mentioned that he rebuilt the model for 24 hours (...) for like 750 epochs and came close to the online demo version.
So yeah, I guess thats what we should also do. Or we email these guys and ask for their actual model.
How are you planning to use the MPQA corpus? Are there baseline results on this?
Bruno implemented a linear model to combine both nns and the emotions.