amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
428 stars 107 forks source link

Environment variable `state` breaks mice() #527

Open mculbert opened 1 year ago

mculbert commented 1 year ago

A somewhat obscure error—if:

  1. The data frame passed to mice() contains a variable called state that consists of either: (a) a character vector (of potentially different values) or (b) only a single (repeated) value (of any type), AND
  2. There is a variable called state available in the environment (either the global environment or an attached data frame),

then mice() throws the error:

Error in s$it : $ operator is invalid for atomic vectors

Examples:

library(mice) # version 3.15.0
mynhanes <- mice::nhanes
state <- "zen"

mynhanes$state <- rnorm(25)
imp <- mice(mynhanes)  # No error

mynhanes$state <- sample(c("WA", "OR", "CA"), 25, replace=T)
imp <- mice(mynhanes)  # Error

mynhanes$state <- 3.1415
imp <- mice(mynhanes)  # Error

rm(state)
imp <- mice(mynhanes)  # No error (warning about logged events)

attach(mynhanes)
imp <- mice(mynhanes)  # Error

The error is coming from here: https://github.com/amices/mice/blob/3e3e3ca0fa53f1b90fb7142bedf36375d5282e90/R/internal.R#L107 because the call to ma_exists("state", ...) on either line 100 or 103 is apparently accessing the wrong variable in the environment through some kind of iterated search of parent environments here: https://github.com/amices/mice/blob/3e3e3ca0fa53f1b90fb7142bedf36375d5282e90/R/internal.R#L140

The intended state variable (wherever it comes from) should perhaps be encapsulated a little more explicitly in a mice-specific data structure, rather than doing an open search of the environment. But, as I'm not familiar with mice()'s innards, I'm not sure what the best fix would be. Maybe it's as simple as renaming state to something a little less generic, like mice_internal_state_ so there is less likely to be a conflict with user variable names.

stefvanbuuren commented 1 year ago

Thanks for noting.

mice uses a list named "state" for logging. I never realised that its name could clash with a variable named "state", which is quite common. Renaming it to something less used could be a quick and practical fix. Need to think about side effects renaming may have.

dannychu1108 commented 2 months ago

Hi, I was trying to use on my dataset without variable name of "state" and still having this issue. May I know how should I solve that? It could run before but suddenly there's an error today.

stefvanbuuren commented 2 months ago

I would expect that rm(state) and not using attach() (or attach-like operations) should evade the problem, as suggested by the original post.

If this does not work for you, you might have hit a new problem case. I would then need a few more details to reproduce.