Closed brshallo closed 2 years ago
Thanks for your question.
I looked into this possibility quite a while ago, and concluded back then that mice()
would not be an easy fit to the tidymodels framework. If I understand correctly, recipes::step-**()
functions are data transformations that present the data in a way convenient to the analysis. In principle, mice()
also does this (by taking away the missings), but in order to get statistically correct estimates it produces multiply-imputed data sets. As far as I know the recipes
package has no support for modelling multiply-imputed datasets and for pooling the resulting estimates. I expect the functions in tidymodels
will break down or produce unsuspected results.
Of course, all of this would not be a problem if we do single imputation. However, I do not want to advertise this possibility since single imputation does not properly deal with the uncertainty caused by the missing data, resulting in invalid p-values, too short confidence intervals, too optimistic cross-validation and related problems.
Curious if there any examples of using {mice} in a tidymodels workflow? Or any plans of creating a
recipes::step_*()
that uses mice in imputation step?