blockchain-interoperability / blockchain-social-media

0 stars 0 forks source link

Investigate Mastodon data #3

Open inwonakng opened 1 year ago

inwonakng commented 1 year ago

Find a way to get historical Mastodon data, hopefully in the similar timeframe of our twitter data -- November 2022.

If not, check online to find if there is available dataset of Mastodon posts. Update on any insights

catherine-ywang commented 1 year ago

The papers on google scholar that analyze Mastodon data are mainly related to analyze the relationship between instances and user behaviors through the metadata without the content of posts. There are several paper expand their data based on a paper published on 1/15/2019 which had the public data but currently deaccessioned due to violations of data usage agreement). The availability of data (collected from 11/2020 - 12/2020) on another paper is on request from the authors (La Cava, L., Greco, S. & Tagarelli, A. Understanding the growth of the Fediverse through the lens of Mastodon. Appl Netw Sci 6, 64 (2021). https://doi.org/10.1007/s41109-021-00392-5).

I only found a paper collected Mastodon posts (Al-khateeb, Samer. (2022). Dapping into the Fediverse: Analyzing What’s Trending on Mastodon Social. In: Thomson, R., Dancy, C., Pyke, A. (eds) Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2022. Lecture Notes in Computer Science, vol 13558. Springer, Cham. https://doi.org/10.1007/978-3-031-17114-7_10), and it collected from 3/18/22 to 4/21/22 to collect the top 20 daily trending hashtags and posts (about 682 trending hashtags (412 unique) and 13,590 public posts (11,601 unique) daily). It only uploaded the script for collection (https://github.com/SamerAl-khateeb/MastodonDataCollection), but did not upload the datasets.

I found some people collected the data on Kaggle, but not sure if it is able to use for research…