amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
447 stars 108 forks source link

mice::ampute() not working properly when adding character variables #623

Closed imazubi closed 7 months ago

imazubi commented 9 months ago

I was using the mice::ampute() under MAR with a very simple dataset. Probabilities will be based on a continuous distribution.

set.seed(123)
age <- rnorm(50, 40, 10)
out <- 20 + 2 * age 
anl <- data.frame(SUBJID = subject, AGE = age, OUT = out)

Only out variable is at risk of being amputed.

pattern_missing <- matrix(c(1, 1, 0), nrow = 1)

Missingness will depend entirely on the AGE variable

weights <- matrix(c(0, 1, 0), nrow = 1)

Now I am going over the amputation process under MAR:

result <- ampute(
    anl, prop = 0.4, 
    patterns = pattern_missing, 
    mech = "MAR", 
    weights = weights,
    type = "RIGHT")

While I was checking the missingness as a function of age, to see whether missingness has been applied under MAR, I saw the following unexpected result (I was checking for different missing proportions), where missingness is not generated under MAR.

image

The issue comes from the following line under sumscores. Since the function is converting the SUBJID column to NA, this matrix multiplication is returning NA, while this should have returned the weighted sum scores.

scores <- apply(candidates, 1, function(x) weights[i, ] %*% x)

See the scores output full of NAs.

image

Could the function somehow return an error, instead of an incorrect result, so that the user can prevent from adding character variables such as subject id? This issue is somehow related to this one I thought that just adding character variables was not harming the process, but I see I was not under the correct assumption.

stefvanbuuren commented 7 months ago

I tried to run your code, but subject is not defined. Here's an adapted version that runs.

library(mice)
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind

set.seed(123)
age <- rnorm(50, 40, 10)
out <- 20 + 2 * age 
anl <- data.frame(SUBJID = as.character(1:50), AGE = age, OUT = out)
sapply(anl, class)
#>      SUBJID         AGE         OUT 
#> "character"   "numeric"   "numeric"

pattern_missing <- matrix(c(1, 1, 0), nrow = 1)
weights <- matrix(c(0, 1, 0), nrow = 1)

result <- ampute(
  anl, prop = 0.4, 
  patterns = pattern_missing, 
  mech = "MAR", 
  weights = weights,
  type = "RIGHT")
#> Warning: Data is made numeric internally, because the calculation of weights
#> requires numeric data

Created on 2024-04-17 with reprex v2.1.0

The documentation states: "Values should be numeric." and in ampute() throws a warning that it converts data to numeric data.

Closing because this is not a bug.