amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
433 stars 107 forks source link

Could PMM be used for binary or ordered categorical variables? #659

Closed shenlan17 closed 3 weeks ago

shenlan17 commented 1 month ago

Hi,

I’m looking into the Predictive Mean Matching (PMM) algorithm (https://bookdown.org/mwheymans/bookmi/multiple-imputation.html#predictive-mean-matching-or-regression-imputation) and noticed that its steps include:

1 Estimating a linear regression model 2 Determining the Bayesian version of the regression coefficient 3 Predicting missing values 4 Finding the closest donor It seems that PMM relies on linear regression, which is typically used for continuous variables. However, the PMM manual suggests that it can be applied to various types of variables. I am very confused about that whether PMM is suitable for binary or ordered categorical variables?

Thank you!

stefvanbuuren commented 1 month ago

We do not predict the missing values (your step 3) but select the five nearest neighbors in the linear predictive metric, draw one of these neighbors, and take its observed value as the imputed value. As long as the predictive metric roughly follows that data shape, the method works quite well, and can even beat dedicated methods (see for example https://stefvanbuuren.name/publication/vink-2014/).