Closed tarensanders closed 1 year ago
Is this happening because they have missing values e.g., in weight for some days and not by others? One solution might be to replace NAs with the mean values within participant for these cases. Then hopefully the imputation procedure won't have to touch these variables?
No the participant data gets matched to the accelerometer data when I create the dataset, so you either have it for all rows or for none.
Hmmm. In that case, it might be better to do post-processing and just average all the ages generated for each participant. Or maybe having the variance in these vars is appropriate as it speaks to uncertainty? What do you think?
Another idea: two steps. Do imputation on a dataset with summaries per participant. Get those values and put back into the pre-imputation dataset. Then do imputation as normal.
These ideas make sense. I think mode [categorical] or median [continuous] on the imputed data for the time invariant variables makes sense.
## categorial variables = mode
## continuous variables = median
Data %>% select(- imputed variables) %>% left_join(data, summary)
From: James Conigrave @.> Date: Thursday, 23 March 2023 at 2:07 pm To: Motivation-and-Behaviour/sleepIPD_analysis @.> Cc: Subscribed @.***> Subject: Re: [Motivation-and-Behaviour/sleepIPD_analysis] Imputation on the time-invariant variables (Issue #66)
Another idea: two steps. Do imputation on a dataset with summaries per participant. Get those values and put back into the pre-imputation dataset. Then do imputation as normal.
— Reply to this email directly, view it on GitHubhttps://github.com/Motivation-and-Behaviour/sleepIPD_analysis/issues/66#issuecomment-1480540391, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACS6SXRHONZIDZ235LXSWHLW5O5APANCNFSM6AAAAAAWER6USE. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Do imputation on a dataset with summaries per participant. Get those values and put back into the pre-imputation dataset. Then do imputation as normal.
The problem with that is that you only do the pooled analysis on the multiple datasets generated from the last imputation (on the per-day variables), but you don't get the variance in the 'fixed' variables. But, maybe it doesn't matter, since these aren't really the variables we care about?
The other option looks like it would be to specify this as a hierarchical dataset (observations nested within participants) and impute such that the level 2 variables are the same for each level 1 observation. Mice seems to support this.
Do imputation on a dataset with summaries per participant. Get those values and put back into the pre-imputation dataset. Then do imputation as normal.
The problem with that is that you only do the pooled analysis on the multiple datasets generated from the last imputation (on the per-day variables), but you don't get the variance in the 'fixed' variables. But, maybe it doesn't matter, since these aren't really the variables we care about?
The other option looks like it would be to specify this as a hierarchical dataset (observations nested within participants) and impute such that the level 2 variables are the same for each level 1 observation. Mice seems to support this.
This is the way. I will start a branch and have a go at this.
Just a quick thought of something we should double check: each row in the dataframe is a day, but a bunch of the variables that are being imputed (age, gender, BMI, etc) don't vary by day. The imputation doesn't know that though.
Here's a quick check:
Returns:
Note how wildly weight varies for 10169.