Closed bcjaeger closed 2 months ago
Correct me if I'm wrong, but your categories are sampled randomly with equal proportions without any relations to the features. Then, your "miss-predictions" are again randomly sampled from the other two categories, but since the categories are not in any way related to the predictors, the joint distribution of your data is not affected by this change. Hence, your "prediction-as-imputation" approach deals with a complete sample from a distribution equal to your population distribution, so it makes sense that this approach yields confidence valid inferences.
The question then is why mice
yields confidence invalid results. To be honest, I don't know, and I cannot reproduce your example, as I got the following error
Error in fifelse(set_to_miss < 0.6, NA, dx) :
'yes' is of type logical but 'no' is of type integer. Please make sure that both arguments have the same type.
After changing the fifelse()
call to ifelse()
, I got the following results, which don't seem to indicate any problems (note that I could also not reproduce your final results table, but didn't bother to solve that, the below provides the same information).
suppressPackageStartupMessages({
library(tidyverse)
library(data.table)
library(nnet)
library(mice)
})
simulation_run_reprex <- function() {
dx_levels <- c("A", "B", "C")
n_obs <- 5000
data_iter <- data.table(
dx = factor(sample(x = dx_levels,
size = n_obs,
replace = TRUE))
)
# disagree column is used to make some of the predicted
# classes disagree with the true classes.
# set_to_miss column is used to ampute the true dx
data_iter[, `:=`(dx_predicted = dx,
disagree = runif(n_obs),
set_to_miss = runif(n_obs))]
data_iter[
disagree < 0.3, # prediction model is right 70% of the time
dx_predicted := fcase(
dx == "A", sample(dx_levels[-1], size = .N, replace = TRUE),
dx == "B", sample(dx_levels[-2], size = .N, replace = TRUE),
dx == "C", sample(dx_levels[-3], size = .N, replace = TRUE)
)
]
data_iter[
# true dx is missing 60% of the time
, dx := ifelse(set_to_miss < 0.6, NA, dx)
]
data_iter[
, `:=`(
# noise variables - should get type 1 error of 5%
x_null = rnorm(.N),
z_null = rnorm(.N),
w_null = rnorm(.N)
)
]
# a single imputation approach using predictions
fit_pred <- multinom(dx_predicted ~ x_null + z_null + w_null,
data = data_iter, trace = FALSE)
z <- summary(fit_pred)$coefficients/summary(fit_pred)$standard.errors
p <- (1 - pnorm(abs(z), 0, 1)) * 2
results_pred <- tibble(pval_B = p['B', -1L],
pval_C = p['C', -1L],
variable = colnames(p)[-1L],
method = 'impute_pred')
# multiple imputation
imp <- data_iter %>%
select(dx, ends_with("null")) %>%
mice(print = FALSE)
fit_imp <- with(imp, multinom(dx ~ x_null + w_null + z_null,
model = TRUE, trace = FALSE))
results_imp <- summary(pool(fit_imp)) %>%
filter(term != '(Intercept)') %>%
select(variable = term, y.level, p.value) %>%
pivot_wider(names_from = y.level,
names_prefix = 'pval_',
values_from = p.value) %>%
mutate(method = 'impute_mice')
bind_rows(
results_pred,
results_imp
)
}
set.seed(321)
results <- replicate(
n = 100,
expr = {simulation_run_reprex()},
simplify = F
)
results %>%
bind_rows() %>%
group_by(method) %>%
summarize(across(.cols = starts_with("pval"),
.fns = ~mean(.x < 0.05)))
#> # A tibble: 2 × 5
#> method pval_B pval_C pval_2 pval_3
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 impute_mice NA NA 0.0167 0.0567
#> 2 impute_pred 0.04 0.0433 NA NA
results %>%
bind_rows() %>%
group_by(method, variable) %>%
summarize(across(.cols = starts_with("pval"),
.fns = ~mean(.x < 0.05)))
#> `summarise()` has grouped output by 'method'. You can override using the
#> `.groups` argument.
#> # A tibble: 6 × 6
#> # Groups: method [2]
#> method variable pval_B pval_C pval_2 pval_3
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 impute_mice w_null NA NA 0.04 0.08
#> 2 impute_mice x_null NA NA 0 0.03
#> 3 impute_mice z_null NA NA 0.01 0.06
#> 4 impute_pred w_null 0.05 0.06 NA NA
#> 5 impute_pred x_null 0.05 0.02 NA NA
#> 6 impute_pred z_null 0.02 0.05 NA NA
Created on 2024-08-14 with reprex v2.0.2
Thank you!
You're correct about the prediction model in the simulated example. In the real simulation, the scenario is more nuanced, but I'm not authorized to share the data used in the real simulation.
Sorry about fifelse()
throwing an error on your side. After changing to ifelse()
on my side, I see the dx
column was coerced into an integer instead of a factor. I added one extra command in the reprex below to coerce dx
back into a factor. I think this will make the result reproducible on your side. Can you confirm?
suppressPackageStartupMessages({
library(tidyverse)
library(data.table)
library(nnet)
library(mice)
})
simulation_run_reprex <- function() {
dx_levels <- c("A", "B", "C")
n_obs <- 5000
data_iter <- data.table(
dx = factor(sample(x = dx_levels,
size = n_obs,
replace = TRUE))
)
# disagree column is used to make some of the predicted
# classes disagree with the true classes.
# set_to_miss column is used to ampute the true dx
data_iter[, `:=`(dx_predicted = dx,
disagree = runif(n_obs),
set_to_miss = runif(n_obs))]
data_iter[
disagree < 0.3, # prediction model is right 70% of the time
dx_predicted := fcase(
dx == "A", sample(dx_levels[-1], size = .N, replace = TRUE),
dx == "B", sample(dx_levels[-2], size = .N, replace = TRUE),
dx == "C", sample(dx_levels[-3], size = .N, replace = TRUE)
)
]
data_iter[
# true dx is missing 40% of the time
# (this coerces it into an integer)
, dx := ifelse(set_to_miss < 0.6, NA, dx)
]
data_iter[
# convert dx back into a factor
, dx := factor(dx, labels = dx_levels)
]
data_iter[
, `:=`(
# noise variables - should get type 1 error of 5%
x_null = rnorm(.N),
z_null = rnorm(.N),
w_null = rnorm(.N)
)
]
# a single imputation approach using predictions
fit_pred <- multinom(dx_predicted ~ x_null + z_null + w_null,
data = data_iter)
z <- summary(fit_pred)$coefficients/summary(fit_pred)$standard.errors
p <- (1 - pnorm(abs(z), 0, 1)) * 2
results_pred <- tibble(pval_B = p['B', -1L],
pval_C = p['C', -1L],
variable = colnames(p)[-1L],
method = 'impute_pred')
# multiple imputation
imp <- data_iter %>%
select(dx, ends_with("null")) %>%
mice(print = FALSE)
fit_imp <- with(imp, multinom(dx ~ x_null + w_null + z_null,
model = TRUE))
results_imp <- summary(pool(fit_imp)) %>%
filter(term != '(Intercept)') %>%
select(variable = term, y.level, p.value) %>%
pivot_wider(names_from = y.level,
names_prefix = 'pval_',
values_from = p.value) %>%
mutate(method = 'impute_mice')
bind_rows(
results_pred,
results_imp
)
}
set.seed(321)
results <- replicate(
n = 100,
expr = {simulation_run_reprex()},
simplify = F
)
results %>%
bind_rows() %>%
group_by(method) %>%
summarize(across(.cols = starts_with("pval"),
.fns = ~mean(.x < 0.05)))
#> # A tibble: 2 × 3
#> method pval_B pval_C
#> <chr> <dbl> <dbl>
#> 1 impute_mice 0.147 0.127
#> 2 impute_pred 0.0633 0.0733
Created on 2024-08-13 with reprex v2.0.2
In your code
# a single imputation approach using predictions
fit_pred <- multinom(dx_predicted ~ x_null + z_null + w_null,
data = data_iter)
dx_predicted
has no missing data, and no imputations are calculated.
I suggest changing your code so that multinom() predicts values that are used for single imputation. I expect that the analysis of that singly imputed dataset will show P-values that are too low.
No worries, @bcjaeger ! I can imagine that the irreproducibility has something to do with having different versions of packages, or something like that.
Now I can indeed reproduce your results, and see mice
yields $p$-values that are too small on average. I'm not entirely sure why this happens. Also, if you change the imputation method to pmm
, the results will be confidence valid, but when using cart
or rf
, you also get too many small $p$-values. I agree with @stefvanbuuren that the single imputation approach will probably also yield invalid $p$-values here, but this does not explain why mice
also yields invalid $p$-values. The issue seemed to be that variance estimates are too small (since the parameter estimates appeared to be unbiased), but I don't know enough about polytomous regression to quickly identify why this issue occurred.
@thomvolker I thought that 0.147 was the average P-value found for mice
, so aren't these larger (not smaller) than 0.05?
@stefvanbuuren, the 0.147 is the type-1 error rate (the proportion of $p$-values smaller than 0.05, considering that the predictors are independent of the outcome).
@thomvolker Ah thanks. So the confidence interval is too short, which shouldn't be the case.
The first explanation that springs to mind is that the neural network is overfitting the data. A simpler model with fewer parameters should be more robust against this.
I deleted the BUG label since this is a methodological issue rather than a programming bug.
That could be, but the model has only 8 parameters (two intercepts and 6 regression slopes) on 5000 observations, which does not seem problematic. I also checked whether there is some implicit regularization (e.g., due to the data augmentation approach) that reduced the between imputation variance, but this also did not seem to be the case. And finally, I checked whether the method = 'sample'
approach, which should be in line with the population model, yields confidence valid results (but this was also not the case, although here the confidence intervals were too wide).
By default, mice
uses nnet.MaxNWts = 1500
to set nnet::nnet
argument MaxNWts
. Perhaps reducing it to its default 1000 could help?
EDIT: This parameter is a safety valve, so changing it will probably have no impact on the statistical properties.
Not really a software issues, so closing
Thank you @stefvanbuuren and @thomvolker for the help!
Hello,
thank you for all you do with the
mice
package. I use it frequently and appreciate its succinct syntax and depth of its features.Context of this issue
A colleague of mine wants to do single imputation with a prediction model, and I feel nervous about that because the "Imputation is not prediction" sections of the
mice
textbook make a very strong case for not doing it. So I wrote a simulation catered to my colleague's analysis to help illustrate thatmice
would be a safer option, but the results of the simulation aren't what I expected.Describe the bug
When using
mice
withnnet
to do multinomial regression, type 1 error appears to be a little higher than expected. This could bennet
's problem. I'm happy to try running the simulation with logistic regression instead if you'd like me to test that hypothesis. If it is amice
problem, I am sorry I can't point to exactly where inmice
the problem might be at the moment.Reproducible example
In the reprex below, I simulate a three category outcome,
dx
, and three variables that have no association todx
. I modeldx
two ways:dx_predicted
mice
Created on 2024-08-13 with reprex v2.0.2
Here is my session info
Created on 2024-08-13 with reprex v2.0.2