Open shiyipeng70 opened 4 years ago
Hmmm, for these kind of tasks it is a good idea to find some kind of annotation within the dataset... such as any labels or hashtags provided within the texts themselves? Another suggestion would be to try some kind of clustering or topic modelling algorithm to find patterns and then assign the cluster labels to the text. You will learn more about this in Week-5.
I would recommend trying to keep the dataset small if you are doing annotations with humans: either that or you will need a lot of annotators, MTurk or otherwise, doing it for you!
As for how small a dataset to validate a content analysis, for the purpose of the HW, a smaller dataset will do: for your final paper, you would want to make sure you have a way to do the annotating.
I am drafting for a project on sentimental differences reflected from Tweets and Sino microblogs (Chinese version of Twitter) under the topic of COVID-19. I have scraped 500 tweets and parsed them into a corpus. The problem is that I may not be able to find human annotators who are willing to annotate each of these texts. Is there any alternative to using human scores? How small the group of annotators is permitted to validate a content analysis in my case? Thank you.