amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
437 stars 107 forks source link

MICE taking hours to run #136

Closed lbros02 closed 6 years ago

lbros02 commented 6 years ago

Hello,

I used the MICE function for multiple imputation previously and it ran in about 10-20 minutes for m=5. Since the update, I am running the same function with the same parameters on roughly the same amount of variables, and it is taking hours/days to run. I was wondering if this was something that could be fixed. Much appreciated.

Thank you.

gerkovink commented 6 years ago

Can you tell me a bit about the methods you are using and the size of your data? Also, which version of mice are you running?

All the best,

Gerko

gerkovink commented 6 years ago

Dear lbros02,

I'd be more than happy to assist, but to do so I would need a bit of information. We have not experienced any increase in runtime over the imputations for the data sets we work with. The issue is therefore more likely related to your data/imputation model then with the code in mice().

Please note that the amount of variables is not a good measure of comparison for the complexity of two missing data problems. The number of rows, the amount of missingness and the composition of the columns (i.e. factor, numeric, etc.) are far more deterministic for the run-time.

For example, given the same dimensions, a missing data problem with many factors will run much slower than a missing data problem with only numeric information. Likewise, a few missing values will be imputed more quickly than a set of largely incomplete columns.

All the best,

Gerko

stefvanbuuren commented 6 years ago

Perhaps you could take a look at imp$loggedEvents to see whether mice is wasting time to remove multicollinear variables. That's often the reason for slowness.