Open 137alpha opened 1 year ago
The imputed values will still contain noise, since random forests inherently have random aspects to them in the training process. I do like the idea of adding noise based on the OOB residuals though - it would certainly be more appropriate for the typical use cases for MICE. I would like to keep both options, since imputing with the value is useful in some cases.
The valueSelector = "value" option uses the model prediction from ranger to impute the points
https://github.com/FarrellDay/miceRanger/blob/4b87a65189ff6ef6f3d88705d989feefe180d103/R/imputeFromPred.R#L19-L21
This is easy to do but inappropriate because it mean that the imputed values will be noiseless, rather than reflecting the observational error of the model.
For regression, the correct thing to do would be to add random noise to the predictions with mean zero and a standard deviation equal to the standard deviation of the OOB residuals.