Closed jerryspan closed 7 years ago
Unfortunately, I just recently found out that we had an error in one of our preprocessing scripts that caused a bunch of the posts to get lost in our processing pipeline. We fixed the error and ran our scripts on the whole dataset again. However, now there is even more clutter in the entities' list and we have around 3000 "entities" instead of 500. We could try POS tagging the posts with another POS tagger than the one from the nltk, as this one seems to have trouble with the slang, missing punctuation, etc.
We tried to analyze the posts' sentiment with the nltk's SentimentIntensityAnalyzer, but the results are not very satisfying. Either there is still an error in our implementation or it is not suited for our task. We have to investigate this further.
Perhaps look for hints like POS tagging in social media, that might boost the POS results. e.g. papers like this: https://www.researchgate.net/publication/265794799_Part-Of-Speech_Tagging_for_Social_Media_Texts
For NLTK, you tried this one? http://www.nltk.org/api/nltk.sentiment.html#module-nltk.sentiment.sentiment_analyzer
They have also incorporated VADER (see here and here) which was specifically developed for social media text.
We used the SentimentIntensityAnalyzer from the nltk.sentiment.vader module. I also read that it was specifically developed for social media texts. That's why I believe there might be an error in our code itself.
I actually found the error. It had something to do with the indexing of the dictionary that is returned by the polarity_scores function of the VADER module. Nevermind, fixed that.
Shall we meet next Monday to discuss progress and how to wrap-up the project?
I am also available this Thursday.
I think next Monday would be better, as we want to try out some things first. These will probably lead to new questions, which we can discuss then.
Fine by me. Let me know, if questions come up during this week as well.
What would be a good time for you on Monday? I would like to propose 12:15.
12.15 is fine by me
Could you offer me an update on the activities you are onto and how they are progressing? Given that in 2 weeks you have the PSP?
Proof-of-concept lexicon approach: Apart from the sentiment shifters, what else is holding you back?
NN approach: POS (done) NER / Noun-Phrase (done) --> I see in my notes that you have around 500 entities and we could even manually keep the interesting ones. Dependency parser : updates?
If you think that it becomes difficult to identify entities like product names, or specific shops or etc, we can always take a more generalized approach and see how sentiment is targeted towards these "generalized" entities/aspects, i.e. the entities you already found.
Already think how this will be fed to the network perhaps?