Leibniz-HBI / Social-Media-Observatory

This repository is the central communication and project management interface for the Social Media Observatory hosted by the Leibniz Insitute for Media Research | Hans-Bredow-Institute
https://leibniz-hbi.github.io/SMO/
Creative Commons Attribution 4.0 International
26 stars 1 forks source link

Explore SSO data #38

Closed FlxVctr closed 4 years ago

FlxVctr commented 4 years ago

https://socialscience.one/blog/unprecedented-facebook-urls-dataset-now-available-research-through-social-science-one

manilevian commented 4 years ago

Hi,

I read through some of their stuff. The proposal sounds very useful. They have achieved a way to bypass certain laws and rules by the GDPR, which wouldn't allow the collection of data, even if it is anonymized. Therefore they used a way only fetch the URL of a users which then (as far as I understood) is anonymized. So they actually never use any data that would help to reveal a person. Despite the FTC and the GDPR rules being in effect, they went on collecting data with the help of Open Source applications, which led to the release of two papers which can be found here and here.

How much Data was gathered:

The dataset contains approximately an exabyte (a quintillion bytes, or a billion gigabytes) of raw data from the platform, a total of more than 10 trillion numbers that summarize information about 38 million URLs shared more than 100 times publicly on Facebook (between 1/1/2017 and 7/31/2019). It also includes characteristics of the URLs (such as whether they were fact-checked or flagged by users as hate speech) and the aggregated data concerning the types of people who viewed, shared, liked, reacted to, shared without viewing, and otherwise interacted with these links.

The best part is, we as researchers can get access to their data. Here is the papers, which shows a overview of the data they have by now and what is needed to get access.

I decided to make a New Sheet on Google so we can gather all DATA COLLECTIONS (Open and Closed) for later elaboration.