Closed SunshineCheesesauce closed 3 years ago
Also - I should add, the model selection post imputation is different than the model selection with complete case analysis so I wanted to derive the new variable post imputation rather than imputing it as JAV
I believe the below reprex
would suit your purpose.
library(mice) # Multiple Imputation
library(dplyr) # Data manipulation
library(tidyr) # Tidy data
library(magrittr) # Pipes
library(purrr) # Functional programming - map()
set.seed(123) # Fix RNG seed
# impute
imp <- mice(nhanes, printFlag = FALSE)
# change completed data and pool analyses
complete(imp, "all") %>%
map(~ mutate(., bmipred = lm(bmi ~ hyp + chl + age)$fitted.values) %>%
mutate(., diff = bmi - bmipred)) %>%
map(lm, formula = diff ~ bmi + bmipred) %>%
pool()
#> Class: mipo m = 5
#> term m estimate ubar b t dfcom
#> 1 (Intercept) 5 2.273737e-15 6.957934e-29 9.047288e-29 1.781468e-28 22
#> 2 bmi 5 1.000000e+00 8.277838e-32 1.016891e-31 2.048053e-31 22
#> 3 bmipred 5 -1.000000e+00 1.797064e-31 1.725633e-31 3.867824e-31 22
#> df riv lambda fmi
#> 1 4.558938 1.560340 0.6094269 0.7127676
#> 2 4.739556 1.474140 0.5958192 0.7002646
#> 3 5.618065 1.152301 0.5353811 0.6432055
Created on 2020-12-16 by the reprex package (v0.3.0)
If your derived variable should guide imputations, then passive imputation would be needed.
Closing as this seems sufficiently addressed in the above reprex
Hi, thank you for your help but I am afraid it still isn't working for me. I want to continue to use this derived variable for further analyses but cannot seem to find away to store the derived variable within the mids that allows me to continue to use it for further analyses
I believe that the derived variable is independent from the imputation process. Simply inserting your desired analysis at the below location in this pseudo-code pipe would therefore be sufficient:
complete(imp, "all") %>%
map(~ mutate(., bmipred = lm(bmi ~ hyp + chl + age)$fitted.values) %>%
mutate(., diff = bmi - bmipred)) %>%
map(HERE YOUR ANALYSIS) %>%
pool()
Thank you for this. Is there a way to convert a mild object back to a mids? I am using the derived variable in further stepwise model selection (as per the workflow) and not sure how I would do this within the above code.
For example:
imp <- mice(nhanes, printFlag = FALSE)
scope <- list(upper = ~ diff + hyp + chl, lower = ~1) expr <- expression(f1 <- lm(age ~ 1), f2 <- step(f1, scope = scope))
complete(imp, "all") %>% map(~ mutate(., bmipred = lm(bmi ~ hyp + chl + age)$fitted.values) %>% mutate(., diff = bmi - bmipred)) %>% map(~mutate(., fit = with(imp, expr)) %>% mutate(.,formulas = lapply(fit$analyses, formula)) %>% mutate(.,terms = lapply(formulas, terms)) %>% mutate(.,votes= unlist(lapply(terms, labels))))
This takes the error:
Error: Problem with mutate()
input fit
.
x Input fit
must be a vector, not a mira/matrix
object.
i Input fit
is with(imp, expr)
Have a look at as.mids()
. It may suit your purpose.
Thank you, and sorry to keep questioning - I can't seem to find a workaround for this. In this section:
NEW <- complete(imp, "all") %>% map(~ mutate(., bmipred = lm(bmi ~ hyp + chl + age)$fitted.values) %>% mutate(., diff = bmi - bmipred))
NEW is a list. I cannot use as.mids() to convert it back to mids because the original data is not included in the "all" part. If I add include = T, then an error occurs due to the missing in the 'bmipred'. I would like to be able to change the completed analysis as per above but reincorporate it within mids to do further analyses.
Thanks again
I have found a way using miceadds:datlist2mids
Hello, I have spent a while reading through all the issues and trying to get this to work but can't seem to find an answer. First, thank you for all your support and this package.
Background: I am assessing the validity of a new method of measuring resilience which takes the residuals of linear regression models and uses them in further models as outcome and predictor variables.
Problem (using reproducible from nhanes).
I impute the data set:
imputed <- mice(nhanes, method = meth, predictorMatrix = predM, m=20, maxit = 20)
I then want to find the best fit model for predicting bmi using stepwise selection (I have 15 predictors in my actual dataset):
scope <- list(upper = ~ age + hyp + chl, lower = ~1) expr <- expression(f1 <- lm(bmi ~ 1), f2 <- step(f1, scope = scope)) fit <- with(imputed, expr). formulas <- lapply(fit$analyses, formula) terms <- lapply(formulas, terms) votes <- unlist(lapply(terms, labels)) table(votes)
I find my final model:
model <- with(imputed, lm(bmi ~ age + hyp + chl)).
All fine up to this point. I now try and save the residuals and the predicted bmi based on the model as new variables:
imputed$data$RS1=NULL imputed$data$PS1=NULL
for(i in 1:20){ imputed$data$RS1= residuals(model$analyses[[i]]) imputed$data$PS1= predict(model$analyses[[i]]) }
I then want to save my new variable which is the difference between the predicted and actual bmi
imputed$data$new_variable<- imputed$data$PS1 - imputed$data$bmi
The results at this point should hypothetically be the inverse of the residuals but I get very strange results.
I then want to do further analysis (using additional variables that were also in the original imputation). e.g.
fit1 <- with(imputed, lm(new_variable ~x1 + x2 + x3))
but I get the error : Error in imp[[j]] : subscript out of bounds.
I also can't use the complete() function on this once I have added these new variables.
Can you please advise on how I can work around this and also if I am saving the residuals correctly. My dataset is very large and the imputation currently takes over 24 hours so it's difficult for me to keep running mice() to get a workaround. If I passively impute RS1, PS1 and new_variable with them currently being all missing would this work?
Many thanks!