Closed mvuorre closed 1 year ago
There are still som 30k playfab data rows without a match. I am sure many of those have matches, but the timestamps differ by a constant hour. Those could be from computers that haven't been configured to summer time etc. See #10
Trying to do this next step in R but I run out of memory.
This has evolved to a more comprehensive "Do everything" pull request that adjusts timestamps and joins the data, but ALSO cleans it to the final formats.
This takes care of #4 and #5.
In order to prevent duplicating data, we must first remove bad responses from Qualtrics data. I count responses that occur within 60 seconds of the previous response as bad. Those are often massively repeated responses that keep getting sent over and over again, indicating some bug. Some of them also appear to be cached responses that are sent to the API in a batch, and are thus recorded within second(s) of each other. Those latter ones won't exist in PlayFab data anyway.