The scope of the task is to detect some patterns within each class of the dataset, trying to work on the explainability of the models that will be later developed. Approaches:
class imbalance problem (report the distribution of the data in both the training and testing set)
most frequent words/keywords extraction
emoji analysis (create the vocabulary of all the emoji in the training set + frequencies)
punctuation (same as previous)
sentiment analysis using an existing tool and comparison against the label in the dataset
(research multiple sentiment analysis tools)
research twitter keyword list for different sentiments (words that express positive/negative sentiments - compare the presence/absence/frequency in the training instance with their instance)
The scope of the task is to detect some patterns within each class of the dataset, trying to work on the explainability of the models that will be later developed. Approaches: