digital-wellbeing / pws-data

Code used to process the raw PowerWash Simulator study dataset
Creative Commons Zero v1.0 Universal
4 stars 1 forks source link

Adjust timestamps and join data #9

Closed mvuorre closed 1 year ago

mvuorre commented 2 years ago

This takes care of #4 and #5.

In order to prevent duplicating data, we must first remove bad responses from Qualtrics data. I count responses that occur within 60 seconds of the previous response as bad. Those are often massively repeated responses that keep getting sent over and over again, indicating some bug. Some of them also appear to be cached responses that are sent to the API in a batch, and are thus recorded within second(s) of each other. Those latter ones won't exist in PlayFab data anyway.

mvuorre commented 2 years ago

There are still som 30k playfab data rows without a match. I am sure many of those have matches, but the timestamps differ by a constant hour. Those could be from computers that haven't been configured to summer time etc. See #10

mvuorre commented 1 year ago

Trying to do this next step in R but I run out of memory.

mvuorre commented 1 year ago

This has evolved to a more comprehensive "Do everything" pull request that adjusts timestamps and joins the data, but ALSO cleans it to the final formats.