FarrellDay / miceRanger

miceRanger: Fast Imputation with Random Forests in R
Other
67 stars 12 forks source link

Error when few mean match candidates exist #1

Closed bryorsnef closed 4 years ago

bryorsnef commented 4 years ago

Hi! This looks like a great package, it's incredibly fast compared to mice(method = "rf")!

I'm getting error when the meanMatchCandidates is greater than the number of available cases for at least one column.

library(miceRanger)

x <- matrix(rnorm(1000), nrow = 100, ncol = 10)

x[1:99,1] <- NA x[1:10,2] <- NA

miceRanger(data.frame(x), m = 1, maxiter = 1) ## error miceRanger(data.frame(x), m = 1, maxiter = 1, meanMatchCandidates = 1) ### no error miceRanger(data.frame(x), m = 1, maxiter = 1, meanMatchCandidates = 10) ### error

samFarrellDay commented 4 years ago

Ahhh yes this would cause an error, right now there is no way to specify meanMatchCandidates by column. This is a good idea, I will implement it at the next iteration (next few days). For now, the best solution depends on your data. If it is reasonably distributed (near-symmetric and monomodal) you can use valueSelector = "value" for that column. If it's not... you can lower meanMatchCandidates, but this will lower the parameter for all columns using valueSelector == "meanMatching".

samFarrellDay commented 4 years ago

This has been implemented. It'll probably be a week or so before it's on CRAN, so you'll have to download from this repository for now.