amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
444 stars 107 forks source link

Incorrect number of dimensions error when using random forest imputation (meth='rf') #447

Closed sylvaticus closed 2 years ago

sylvaticus commented 2 years ago

When I try the random forest imputation I have an Error in nodes_mis[, i] : incorrect number of dimensions error. This doesn't happen if I use other imputation methods or if I change slightly the matrix with the input data. Also, which package is mice using for actual randomForest prediction ? The documentation says randoomForest, but randomForest is not installed (but I remember it did ask me to install a package when I first tried the meth='rf'....).

> library(mice)
> data <- matrix(c(1.0, 10.5, 1.5, 13.2, 1.8, 8.0, 1.7, 15.0, 23.0, 40.0, 2.0, 21.0, 3.3, 38.0, 4.5, -2.3, NA, -2.4),nrow=9,ncol=2, byrow=TRUE)
> impObject <- mice(as.data.frame(data),m=1,meth='rf',printFlag=TRUE, seed=500)

 iter imp variable
  1   1  V1Error in nodes_mis[, i] : incorrect number of dimensions

> impObject <- mice(as.data.frame(data),m=1,meth='pmm',printFlag=TRUE, seed=500) # or any other than "rf"...

 iter imp variable
  1   1  V1
  2   1  V1
  3   1  V1
  4   1  V1
  5   1  V1

> data <- matrix(c(1.0, 10.5, 1.5, 13.2, 1.8, 8.0, 1.7, 15.0, NA, 40.0, 2.0, 21.0, 3.3, 38.0, 4.5, -2.3, NA, -2.4),nrow=9,ncol=2, byrow=TRUE)
> impObject <- mice(as.data.frame(data),m=1,meth='rf',printFlag=TRUE, seed=500)

 iter imp variable
  1   1  V1
  2   1  V1
  3   1  V1
  4   1  V1
  5   1  V1

> packageVersion('mice')
[1] ‘3.14.0’
> packageVersion('randomForest')
Error in packageVersion("randomForest") : 
  there is no package called ‘randomForest’
hanneoberman commented 2 years ago

Hi @sylvaticus, what happens when you re-run your code after installing randomForest and/or ranger? Depending on your version of mice, the random forest engine is from either of these packages.

sylvaticus commented 2 years ago

I installed randomForest (> packageVersion('randomForest') #[1] ‘4.6.14’) but still with that problem (after restarting R). I then removed ranger and when trying again I get:

 iter imp variable
  1   1  V1Package ranger needed. Install from CRAN? (Yes/no/cancel) 

And then it install https://cloud.r-project.org/src/contrib/ranger_0.13.1.tar.gz and compile it(I'm on Linux) but I am still with that problem:

** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (ranger)

The downloaded source packages are in
    ‘/tmp/Rtmp4A4q3H/downloaded_packages’
Error in nodes_mis[, i] : incorrect number of dimensions
> 
sylvaticus commented 2 years ago

is there a testPackage("ranger") sort of function in R ? I did try the example on the homepage of the ranger package and it works great..

hanneoberman commented 2 years ago

I wasn't able to create the error with mice version 3.13, but after updating to 3.14 I got it too! @stefvanbuuren do you know what might be the problem? Could it have to do with mice:::install.on.demand()?

stefvanbuuren commented 2 years ago

Version 3.14.0 changes the default package for method rf from "randomForest" to "ranger" (#431). It seems that there is an integration issue with "ranger" that we haven't discovered earlier.

My reprex yields:

library(mice, warn.conflicts = FALSE)
data <- matrix(c(1.0, 10.5, 1.5, 13.2, 1.8, 8.0, 1.7, 15.0, 23.0, 40.0,
                 2.0, 21.0, 3.3, 38.0, 4.5, -2.3, NA, -2.4),
               nrow = 9, ncol = 2, byrow = TRUE)
df <- data.frame(data)

# In 3.14, ranger is the default
mice.impute.rf(y = df$X1, ry = !is.na(df$X1), x = df[, "X2", drop = FALSE],
               rfPackage = "ranger")
#> Error in nodes_mis[, i]: incorrect number of dimensions

# The "old" randomForest still works
mice.impute.rf(y = df$X1, ry = !is.na(df$X1), x = df[, "X2", drop = FALSE],
               rfPackage = "randomForest")
#> [1] 1.5

Created on 2021-11-29 by the reprex package (v2.0.1)

As a temporary fallback, add the rfPackage argument as mice(..., rfPackage = "randomForest").

@prockenschaub Could you have a look at what might cause the problem, and perhaps add a test file?

prockenschaub commented 2 years ago

The problem arises when there is only a single missing value. In my original code, I didn't account for R's automatic conversion to vector when selecting a single row of a matrix. I submitted a pull request #448 that fixes this behaviour.

stefvanbuuren commented 2 years ago

@prockenschaub Thanks a lot. Yes, I know this glitch too well... :-)

stefvanbuuren commented 2 years ago

mice 3.14.2 solves the problem.

@sylvaticus Thanks for reporting @hanneoberman @prockenschaub Thanks for solving