alexanderrobitzsch / miceadds

Some Additional Multiple Imputation Functions, Especially for 'mice'.
https://alexanderrobitzsch.github.io/miceadds/
16 stars 2 forks source link

`mice.impute.ml.lmer` on large three-level dataset: `"binary"` logistic model returns error, 'hangs' when adding random slopes or interactions #25

Open pehkawn opened 1 year ago

pehkawn commented 1 year ago

I am currently trying to impute a three-level dataset with 87 columns and 71,756 rows. The variables comprise of which 4 identifier columns, 15 continuous outcome variables without missing entries, and 68 predictors and covariates with missing entries:

I've been following Simon Grund's example for modeling three-level data using mice with the mice.impute.ml.lmer-function. Naturally, I had to make some adaptations to the example model to fit my data:

  1. I tried setting model to "binary" to run a logistic mixed effects model for the dichotomous variables ("pmm" for the ordinal, "continuous" for the continuous).
  2. I tried added random slopes and interaction effects.
  3. mice.impute.2lonly.pmm was used instead of mice.impute.2lonly.norm for the top level imputation.
  4. I added a post processing to a level 2 variable where I set upper and lower boundaries.

However when running mice (with some variables modeled as "binary" (without random slopes or interactions), I get the following warning:

Warning message in commonArgs(par, fn, control, environment()):
“maxfun < 10 * length(par)^2 is not recommended.”

Execution of mice hangs at this point.

I ran a test with mice (1 iteration), this time with all dichotomous variables as "pmm", and this time the function completed the run. However, adding variables to random_slopes it seemingly gets stuck (running infinitely) on the imputation of the first three variables. Now, my assumption is that this is due to the relatively large dataset, making the the process computationally very demanding.

I am wondering what exactly causes this error message, and if there are ways to avoid it. Also, I would like to know if there are ways to improve computational efficiency of such a large model.

I am not very familiar with mice, but I have some thoughts regarding how the data is imputed: I am planning to use the imputed data for a structural equation model I've built, where all the variables are grouped into indicators of latent constructs. It therefore seems natural that the indicator variables that belongs to the same construct are imputed together.

alexanderrobitzsch commented 1 year ago

I think that computational issues can particularly occur in models with random slopes. I can only advise using simple imputation models.