NErler / JointAI

Joint Analysis and Imputation of generalized linear models and linear mixed models with missing values
https://nerler.github.io/JointAI
28 stars 4 forks source link

Impute specific values? #7

Closed harryparr closed 2 years ago

harryparr commented 2 years ago

Dear Dr. Erler,

Many thanks for this terrific package.

I wonder if it's possible to impute specific values? For example, a coxph_imp class that has some missing independent covariate ordered categories clmm with 4 ordered levels but I'd only want the second & third levels to be imputed, not to impute the reference or the fourth level. Can this be done in the get_MIdat or is it within the _imp function?

Many thanks, Harry

NErler commented 2 years ago

Dear Harry,

thank you for your comment and nice feedback.

By default, JointAI works under the assumption of MAR, i.e., that the missing values come from the same distribution as the observed values, conditional on the other data. The case where the missing observations can only take a subset of the categories in the observed data would likely be MNAR, which requires some additional specifications.

In the case you describe, you could try a pattern mixture model approach by editing the JAGS model JointAI created. This is technically possible with JointAI, but you will introduce additional assumptions by fixing the parameter values.

The easiest way would probably be to include a missingness indicator for the incomplete ordinal variable as an auxiliary variable (argument auxvars) and to specify that this indicator has a non-proportional effect on the ordinal covariate (argument nonprop).

Here is a simplified example (you need to work with the GitHub version of JointAI - I just discovered and fixed a bug in the nonprop argument):

library("JointAI")

# create some extra missing values for this example
wideDF$O2[sample(1:nrow(wideDF), size = 30)] <- NA

# create the missing data indicator
wideDF$O2NA <- is.na(wideDF$O2)

# set-up run without iterations to get the JAGS model
mod0 <- lm_imp(y ~ C1 + B1 + O2, data = wideDF,
               auxvars = ~O2NA, nonprop = list(O2 = ~ O2NA),
               modelname = "mymodel.R", modeldir = getwd(),
               keep_model = TRUE,
               n.adapt = 0)

The JAGS model has then been saved as "mymodel.R" which you can edit. Rows 29 - 31 in this example are

    eta_O2_1[i] <- M_lvlone[i, 9] * alpha[3]
    eta_O2_2[i] <- M_lvlone[i, 9] * alpha[4]
    eta_O2_3[i] <- M_lvlone[i, 9] * alpha[5]

This can be changed to

    eta_O2_1[i] <- M_lvlone[i, 9] * 10
    eta_O2_2[i] <- M_lvlone[i, 9] * 0
    eta_O2_3[i] <- M_lvlone[i, 9] * (-10)

The values 10 and -10 are arbitrary and just chosen to be extreme enough to make categories 1 and 4 sufficiently unlikely. When you then save the file as "mymodel_edited.R" you can then run the model as

mod <- lm_imp(y ~ C1 + B1 + O2, data = wideDF,
              auxvars = ~O2NA, nonprop = list(O2 = ~ O2NA),
              monitor_params = c(imps = TRUE, other_models = TRUE),
              modelname = "mymodel_edited.R", modeldir = getwd(),
              overwrite = FALSE,
              n.iter = 500)

All imputed values are now in categories 2 and 3:

imp <- get_MIdat(mod, m = 30, include = FALSE)
table(category = imp$O2, missing = imp$O2NA)
        missing
category FALSE TRUE
       1   480    0
       2   360  394
       3   630  566
       4   570    0

I hope this helps. Keep in mind that this is just a technical solution. I haven't thought through the theoretical implications of this.

Best, Nicole