Open JBGruber opened 3 months ago
The task is relativly straightforward.
Here is the manually annotated data: https://drive.google.com/file/d/1uWkoGrdIaSIwagJpgB1WqSMGniQ94mph/view?usp=drive_link
The annotations were done based on title and text of the abstracts. You could also experiment with using more variables.
You should make a fork on Github and I would suggest you work in Quarto, as we did in the rest of the project. But an R or Python script is also fine.
Determining if research was relevant for us (ie if they were using or deveoping a tool for opinion mining) was rather difficult in the manual annotation. so failure is a possibiity I think. But it's worth trying if the machine can do this.
We now have a dataset of 1,002 coded abstracts, 554 of which are relevant (based on #9). This was a lot of work and I can't thank everyone who was involved enough. However, there are still rougly 4,000 unlabelled ones.
Coding more manually is not really worth it I think. But if we can build a model that does it for us, we could add more to the full paper annotation on demand (assuming this can be done somewhat automatically as well).
In short: we should use the coded abstracts to build a classifier.