Negation - Githubissues

adeshpande3 / LSTM-Sentiment-Analysis

Sentiment Analysis with LSTMs in Tensorflow

MIT License

983 stars 431 forks source link

Negation #30

Open GregSilverman opened 6 years ago

GregSilverman commented 6 years ago

Hi, first of all, thanks for putting this project up. It's extremely helpful for what I need to do.

Now to the question: I'm running through these notebooks and in the "Pre-Trained LSTM" you have two examples, one, of a positive sentiment and the other of a negative sentiment. However, I noticed that if I were to say ""That movie was not terrible" then it will still be classified as "Negative sentiment" and similarly, if I were to say "That movie was not the best one I have ever seen" then it is still classified as "Positive sentiment."

What would be your recommendation for negation handling? I did find this: negation-handling-in-sentiment-analysis, but in case you've done anything wrt to this example I would be interested in seeing it.

Thanks!

adeshpande3 commented 6 years ago

Yeah that's a good question. I think one of the classic issues with using deep learning models is figuring out how to respond once you see particular failure cases. In this case, it seems like the model hasn't really picked up on the negation. I think there's two kinds of approach in general. One is to retrain the model with a dataset of increased size and/or variation in hopes that it will generalize better (or you could change up the model architecture as well). Another is to go with a more hardcoded approach (like that stack overflow post took) where you add code to flip the sentiment when the word 'not' is detected. That approach will work but is definitely more brittle and probably won't scale, so the first option is what a lot of people go with.

GregSilverman commented 6 years ago

Regarding the change of architecture, I assume you mean implementing a part-of-speech tagger? From my preliminary searches, something like parsey mcparseface sounds like a promising avenue of exploration, but I'm not sure how this would fit into the workflow with word2vec. As a very specific example, this sounds like a very interesting and promising approach: Concatenating word2vec and POS features I'll gladly take a stab at this with some guidance.