digital-wellbeing / pws-data

Code used to process the raw PowerWash Simulator study dataset
Creative Commons Zero v1.0 Universal
4 stars 1 forks source link

Fix game_saved bug #11

Closed mvuorre closed 1 year ago

mvuorre commented 2 years ago

In an early version there was a bug that caused too many game_saved events (many per second).

I tried to fix it in 842bc63e3cf218ad9f17f61f1dea005807fc9b18, but the data is too large to do this in R (with methods I used.)

> d <- bind_rows(collect(d), d2) %>% 
+   arrange(EntityId, Timestamp)
Error: vector memory exhausted (limit reached?)

Find out another way of removing game_saved rows where lag from previous was too short. Optimally in the database (@rpsychologist?).

rpsychologist commented 2 years ago

I think it would be possible to refactor the code so that some of the intermediate steps are stored in either temporary or permanent tables in the db. Right now dbplyr just keeps addding subqueries. See compute and rows_append or _insert.

mvuorre commented 2 years ago

Great idea, I'll look into it.

mvuorre commented 1 year ago

Fixed in #9.