Dataset analysis and visualization + reporting results

LDiana22 commented 5 years ago

The scope of the task is to detect some patterns within each class of the dataset, trying to work on the explainability of the models that will be later developed. Approaches:

class imbalance problem (report the distribution of the data in both the training and testing set)
most frequent words/keywords extraction
emoji analysis (create the vocabulary of all the emoji in the training set + frequencies)
punctuation (same as previous)
sentiment analysis using an existing tool and comparison against the label in the dataset (research multiple sentiment analysis tools)
research twitter keyword list for different sentiments (words that express positive/negative sentiments - compare the presence/absence/frequency in the training instance with their instance)

LDiana22 commented 5 years ago

Corpus analysis inspiration: https://www.researchgate.net/publication/324703947_A_Survey_on_Twitter_as_a_Corpus_for_Sentimental_Analysis_and_Opinion_Mining

LDiana22 commented 5 years ago

Explainability with applications in Medicine - might be interesting to request: https://www.researchgate.net/publication/331425724_Explainable_Sentiment_Analysis_with_Applications_in_Medicine

LDiana22 / NLP

Dataset analysis and visualization + reporting results #1