amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
447 stars 108 forks source link

Error in colMeans(as.matrix(imp[[j]]), na.rm = TRUE) : 'x' must be numeric #601

Open stefvanbuuren opened 1 year ago

stefvanbuuren commented 1 year ago

Describe the bug MICE crashes on an incomplete character variable

To Reproduce

library(mice)
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
nh3 <- nhanes2
nh3$chl <- as.character(nh3$chl)
mice(nh3)
#> 
#>  iter imp variable
#>   1   1  bmi  hyp
#>   1   2  bmi  hyp
#>   1   3  bmi  hyp
#>   1   4  bmi  hyp
#>   1   5  bmi  hyp
#> Error in colMeans(as.matrix(imp[[j]]), na.rm = TRUE): 'x' must be numeric

Created on 2023-11-20 with reprex v2.0.2

Expected behavior mice() should not touch or impute character variables.

hanneoberman commented 1 year ago

Cannot reproduce with mice 3.16.8

nh3 <- mice::nhanes2
nh3$chl <- as.character(nh3$chl)
imp <- mice::mice(nh3)
#> 
#>  iter imp variable
#>   1   1  bmi  hyp
#>   1   2  bmi  hyp
#>   1   3  bmi  hyp
#>   1   4  bmi  hyp
#>   1   5  bmi  hyp
#>   2   1  bmi  hyp
#>   2   2  bmi  hyp
#>   2   3  bmi  hyp
#>   2   4  bmi  hyp
#>   2   5  bmi  hyp
#>   3   1  bmi  hyp
#>   3   2  bmi  hyp
#>   3   3  bmi  hyp
#>   3   4  bmi  hyp
#>   3   5  bmi  hyp
#>   4   1  bmi  hyp
#>   4   2  bmi  hyp
#>   4   3  bmi  hyp
#>   4   4  bmi  hyp
#>   4   5  bmi  hyp
#>   5   1  bmi  hyp
#>   5   2  bmi  hyp
#>   5   3  bmi  hyp
#>   5   4  bmi  hyp
#>   5   5  bmi  hyp
#> Warning: Number of logged events: 1
imp$loggedEvents
#>   it im dep     meth out
#> 1  0  0     constant chl

Created on 2023-11-20 with reprex v2.0.2

stefvanbuuren commented 1 year ago

Ah, thanks. I forgot to mention that my test was calculated from the branch support_blocks branch.

I will add a test to that branch to ban this baby from appearing in master.

stefvanbuuren commented 7 months ago

Test added to mice4 branch

stefvanbuuren commented 7 months ago

I got a report that the error may also appear in the CRAN version, mice 3.16.0. Here's an example and work-around.

library(mice)
library(dplyr)
packageVersion('mice') # 3.16.0

nh3 <- mice::nhanes2
# add column with a character variable
rin <- c("123456789", "123456788", "123456778", "123456678", "123455678", 
         "123456799", "123445689", "123445679", "123345689", "122345678",
         "223456789", "223456788", "223456778", "223456678", "223455678", 
         "223456799", "223445689", "223445679", "223345689", "222345678",
         "323456799", "323445689", "323445679", "323345689", "322345678")
nh3_data <- nh3 %>% cbind(rin)

# impute train data
imp <- mice(nh3_data, m = 3, seed = 22112)
# use mice.mids and the mids object imp on test data (I used the same data set, but suppose it is new test data)
imp_test <- mice.mids(imp, newdata = nh3_data, maxit = 1)

# If you're unlucky (BUT WHY??) you'll get: Error in colMeans(as.matrix(imp[[j]]), na.rm = TRUE) : 'x' must be numeric

# ad-hoc solution 
nh3_data <- nh3_data %>% mutate(rin = as.numeric(rin),
                                chl = as.numeric(chl))
# the error seems to be caused by character variable, even complete ones that are not imputed
imp_test <- mice.mids(imp, newdata = nh3_data, maxit = 1) 

When I run this in my system, everything is fine. However some users report a crash with Error in colMeans(as.matrix(imp[[j]]), na.rm = TRUE) : 'x' must be numeric. It is not yet clear why behaviours across systems is inconsistent.