Hi @JulioV, I created a branch named data_cleaning/multiple_platforms based on our discussion. Could you review the code when you are free? Thanks!
The following things are different from our conversation:
Besides timestamp and os columns, the device_id column was also added to platforms output. We need this column to run readable_datetime.R script. By doing so, we can use the correct timezone to get local_datetime. (not UTC)
For data cleaning script, I do not assign the majority class of the platforms. Instead, I assume all the time segments with multiple platforms to be iOS platform. The reason is that all the iOS features can also be extracted from Android devices. But, some Android features are not available for iOS devices. Selected event features are imputed with 0 by the following two steps: (1) features which can be extracted from both Android and iOS devices: impute all rows directly; (2) features which can only be extracted from Android devices: select these rows and impute
Hi @JulioV, I created a branch named
data_cleaning/multiple_platforms
based on our discussion. Could you review the code when you are free? Thanks!The following things are different from our conversation: