FarrellDay / miceRanger

miceRanger: Fast Imputation with Random Forests in R
Other
67 stars 12 forks source link

valueSelector = value does not inject noise, making imputation inappropriate #21

Open 137alpha opened 1 year ago

137alpha commented 1 year ago

The valueSelector = "value" option uses the model prediction from ranger to impute the points

https://github.com/FarrellDay/miceRanger/blob/4b87a65189ff6ef6f3d88705d989feefe180d103/R/imputeFromPred.R#L19-L21

This is easy to do but inappropriate because it mean that the imputed values will be noiseless, rather than reflecting the observational error of the model.

For regression, the correct thing to do would be to add random noise to the predictions with mean zero and a standard deviation equal to the standard deviation of the OOB residuals.

AnotherSamWilson commented 1 year ago

The imputed values will still contain noise, since random forests inherently have random aspects to them in the training process. I do like the idea of adding noise based on the OOB residuals though - it would certainly be more appropriate for the typical use cases for MICE. I would like to keep both options, since imputing with the value is useful in some cases.