This repository contains the code used to clean the raw Qualtrics and PlayFab data.
The cleaned data are openly available under CC0 license.
The dataset includes a codebook and the cleaned data as a DuckDB database dump (.csv files with code to read them back into a database). Here's an example way to read those data back into a DuckDB database. Assume you have the raw .csv files in a directory "data", then:
db <- dbConnect(
duckdb(),
dbdir = "example-database.duckdb",
read_only = FALSE
)
dbSendQuery(
db,
"IMPORT DATABASE 'data';"
)
dbListTables(db)
Users can also load the .csv files directly.
These are assumed to be in data-raw/playfab-export
and the user's local OneDrive directory indicated in.env
. Those data are not publicly available; this is for internal use only.
The preprocessing was run on Ubuntu 20.04.5 LTS, and requires (the version used in shown in the parenthesis):
Use docker-compose
to start postgres and pgAdmin (optional)
docker-compose up -d
Import JSON files to Postgres
make import
After this, remove or rename data-raw/playfab-export
, so that the next import won't re-import the retrospective data export data. Another file will be saved in data-raw/
that indicates which files from OneDrive/S3 have already been imported, so that those won't be duplicated in future imports, so it is safe to call make import again.
You will then have the database accessible at localhost:5432
.
One file will throw an error. This can be ignored, since it only includes test data from before the study launched.
data-raw/playfab-export/pws-playfab-export-2022-08-17_10-00-00.json.gz
ERROR: invalid input syntax for type json
DETAIL: Token "SegmentAndDefinitions" is invalid.
CONTEXT: JSON data, line 1: ...":{"SegmentDefinition":"[[{"SegmentAndDefinitions"
The data processing R script reads the user's OneDrive location for the raw qualtrics data. Make sure this is defined properly in .env
. Also make sure there is only one file in the qualtrics directory. That should be exported from Qualtrics with Export -> Export as CSV, and make sure the "Export all fields" is selected. Can be just a .csv or compressed.