amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
433 stars 107 forks source link

mice in tidymodels package recipes? #478

Closed brshallo closed 2 years ago

brshallo commented 2 years ago

Curious if there any examples of using {mice} in a tidymodels workflow? Or any plans of creating a recipes::step_*() that uses mice in imputation step?

stefvanbuuren commented 2 years ago

Thanks for your question.

I looked into this possibility quite a while ago, and concluded back then that mice() would not be an easy fit to the tidymodels framework. If I understand correctly, recipes::step-**() functions are data transformations that present the data in a way convenient to the analysis. In principle, mice() also does this (by taking away the missings), but in order to get statistically correct estimates it produces multiply-imputed data sets. As far as I know the recipes package has no support for modelling multiply-imputed datasets and for pooling the resulting estimates. I expect the functions in tidymodels will break down or produce unsuspected results.

Of course, all of this would not be a problem if we do single imputation. However, I do not want to advertise this possibility since single imputation does not properly deal with the uncertainty caused by the missing data, resulting in invalid p-values, too short confidence intervals, too optimistic cross-validation and related problems.