This GitHub repository corresponds to the dataset used for our research article titled An Exploratory Study of COVID-19 Misinformation on Twitter.
In our article, we present a synthesis of established work in social media analytics and new streams in the detection and mitigation of misinformation applied to one of the most challenging topics for societies (and, possibly also, scientific research): the COVID-19 crisis.
We have used two datasets for our study. The first dataset is the tweets which have been mentioned by fact-checking websites and are classified as false or partially false, and the second dataset consists of COVID-19 tweets collected from publicly available corpus TweetsCOV19 (January-April 2020)} and in-house crawling from May-July 2020. A detailed description of the data collection process is explained in section 3.1 of the paper.
We have shared the two datasets; one is sampled tweets from each day, i.e., dataset II. Another is an annotated tweet for misinformation(dataset I). It has around 1500 tweets in 4 different categories. The format of data as follows
Please cite the OSNEM paper:
@article{shahi2021exploratory,
title={An exploratory study of covid-19 misinformation on twitter},
author={Shahi, Gautam Kishore and Dirkson, Anne and Majchrzak, Tim A},
journal={Online Social Networks and Media},
volume={22},
pages={100104},
year={2021},
publisher={Elsevier}
}
For help or issues using data, please submit a GitHub issue.
For personal communication related to our work, please contact Gautam Kishore Shahi(gautamshahi16@gmail.com
), Anne Dirkson(a.r.dirkson@liacs.leidenuniv.nl
) and Tim A. Majchrzak(timam@uia.no
).
For more update on the related publication on the topic of FakeCovid, please visit https://gautamshahi.github.io/FakeCovid/