NErler / JointAI

Joint Analysis and Imputation of generalized linear models and linear mixed models with missing values
https://nerler.github.io/JointAI
28 stars 4 forks source link

Get imputation values for survival model #3

Closed bbb801 closed 2 years ago

bbb801 commented 2 years ago

Dear Sir or Madam I am running the guideline survival models and would like to extract the imputed data. I got a error. Any suggestion? Thanks.

library(JointAI) mod6a <- survreg_imp(Surv(futime, status != "alive") ~ age + sex + copper + trig, models = c(copper = "lognorm", trig = "lognorm"), data = subset(PBC, day == 0), n.iter = 250,monitor_params = c(analysis_main=TRUE, imp=TRUE)) |**| 100% Warning message: In readChar(modelfile, file.info(modelfile)$size) : can only read in bytes in a non-UTF-8 MBCS locale impDF <- get_MIdat(mod6a, m = 10, seed = 2019) Error: I cannot find imputed values for “copper”. Did you monitor them? mod6b <- coxph_imp(Surv(futime, status != "alive") ~ age + sex + copper + trig,# models = c(copper = "lognorm", trig = "lognorm"), data = subset(PBC, day == 0), n.iter = 250,monitor_params = c(analysis_main=TRUE, imp=TRUE)) |**| 100% Warning message: In readChar(modelfile, file.info(modelfile)$size) : can only read in bytes in a non-UTF-8 MBCS locale impDF <- get_MIdat(mod6b, m = 10, seed = 2019) Error: I cannot find imputed values for “copper”. Did you monitor them?

image

NErler commented 2 years ago

Thank you for your post. This is indeed a bug. The wrong elements of the data matrix are set to be monitored. I will look into fixing this. In the meantime, you can "solve" the issue by specifying the correct elements to be monitored via the "other" argument of monitor_params.

library("JointAI")

# modelspecification without actually running it
mod0 <- survreg_imp(Surv(futime, status != "alive") ~ age + sex +
                       copper + trig, models = c(copper = "lognorm", trig = "lognorm"),
                     data = subset(PBC, day == 0), n.adapt = 0)

# indices of the column and rows of the data matrix containing the variable "copper"
col <- which(colnames(mod0$data_list$M_lvlone) == "copper")
rows <- which(is.na(mod0$data_list$M_lvlone[, "copper"]))

# node to be monitored
imp_copper <- paste0("M_lvlone[", rows, ",", col, "]")

# run the model
mod6a <- update(mod0, n.adapt = 100, n.iter = 250,
                monitor_params = list(imps = TRUE, other = imp_copper))

# extract the imputed values
impDF <- get_MIdat(mod6a, m = 10, seed = 2019)
bbb801 commented 2 years ago

Thank you! Dr NErler. I am not sure of the meaning of this solution and I would like to use it on my own data with over 300000 rows and 80 columns. Is it possible to use survreg_imp/coxph_imp for multil-class (like in competing risk model) or multi-label classification? Is it possible to training survreg_imp/coxph_imp model and then use this model to impute the missing value for another dataset (like the testing set)? Thank you!

NErler commented 2 years ago

It is not yet possible to fit competing risk models in JointAI. There is no out-of-the-box solution for imputing values in another dataset using the parameter estimates from the first dataset. Missing values in covariates are imputed from their full-conditional posterior distributions, which are derived within JAGS from the joint distribution that JointAI specifies. Imputing values in a new dataset based on the parameters from the original data would require you to do this sampling in R, using, for instance, a Metropolis-Hastings sampler because the imputation models will usually not have a closed-form.