Closed GBA19 closed 1 year ago
Thanks for noting. I wasn't aware of the repetitive nature of the imputes in this case.
If I understand correctly, the problem occurs when no seed
argument is given to mice(...)
. In that case, withr::local_preserve_seed()
initialises from the parent environment. If nothing has changed there, then we will get the same seed over and over again.
Perhaps a simple fix would be to add kick <- runif(1L)
in the mice()
function, just to kick the random generator, e.g.,
# set local seed, reset random state generator after function aborts
if (is.na(seed)) {
kick <- runif(1L)
withr::local_preserve_seed()
} else {
withr::local_seed(seed)
}
Would that solve your case? Could be any unfortunate side effects?
An added note. My simulations usually set the seed
explicitly, e.g.
for (i in 1:5) {
imp <- mice(nhanes, m = 1, printFlag = FALSE, seed = i)
cat(complete(imp)$bmi |> mean(), "\n")
}
This is perhaps a bit safer.
I was looking into this just now and thougth that the following might be the solution. Forgive me if I am totally wrong since I do not fully understand the withr::local_preserve_seed()
. Nevertheless, I am just returning from the man-page.
if (is.na(seed)) {
withr::local_preserve_seed() # restores .Random.seed on exiting scope
set.seed(NULL) # reinitialize .Random.seed
} else {
withr::local_seed(seed)
}
If this is totally wrong, so be it; if not, this appears to be more elegant.
Yes thanks, that is better as that will restore .Random.seed
after returning from mice()
.
I just pushed an update.
Thanks a lot for your contribution.
Pretty late to the party, but this new implementation also results in unexpected, and potentially unwanted, behavior.
Namely, setting a seed in the global environment is reversed within the mice
call, such that the results are always unique if no seed is specified within the mice()
call. Setting a seed in the global environment thus no longer ensures reproducibility of the results, which is something that I think would be preferable over the current solution.
library(mice)
#>
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
library(magrittr)
set.seed(123)
imp1 <- mice(boys, print = F)
imp1 %$%
lm(hgt ~ bmi + tv) %>%
pool() %>%
summary()
#> term estimate std.error statistic df p.value
#> 1 (Intercept) 45.750588 7.8251969 5.846573 173.68503 2.431348e-08
#> 2 bmi 2.990458 0.4926343 6.070341 92.33349 2.795360e-08
#> 3 tv 3.738712 0.2099027 17.811641 26.77784 2.220446e-16
set.seed(123)
imp2 <- mice(boys, print = F)
imp2 %$%
lm(hgt ~ bmi + tv) %>%
pool() %>%
summary()
#> term estimate std.error statistic df p.value
#> 1 (Intercept) 42.567502 10.7210055 3.970477 12.82719 1.638943e-03
#> 2 bmi 3.165119 0.6533964 4.844103 12.59753 3.503675e-04
#> 3 tv 3.746043 0.2230530 16.794411 19.05428 7.056578e-13
Created on 2022-07-08 by the reprex package (v2.0.1)
I am actually unsure whether the local_preserve_seed()
argument is something you would want. What is wrong with the fact that a call to mice adjust the state of the random number generator, as it actually uses this randomness?
Ah... what a can of worms this is...
We have in mice.R:
# set local seed, reset random state generator after function aborts
if (is.na(seed)) {
# restores .Random.seed on exiting scope
withr::local_preserve_seed()
# reinitialize .Random.seed
set.seed(NULL)
} else {
# take specified seed to set local seed
withr::local_seed(seed)
}
We probably need some alternative to set.seed(NULL)
The function umap::umap
has preserve.seed as a parameter, also discussed in a vignette.
I see three usage scenarios, always assuming the default seed = NA
on the mice
function call:
mice
without changing the seed in a parent or ancestor environment: We would expect a randomly different result on each call. mice
and a parent or ancestor environment to replicate exactly. mice
as requested in #426: I do not fully grasp the importance of this scenario. As I understand the present state mice
fulfills items 1 and 3 but fails on 2.
Maybe a preserve.seed= FALSE
as a default parameter is a way to fulfill 1 and 2 and setting preserve.seed= TRUE
could fulfill item 3. This would entail restoring the previous behavior and make item 3 a special case.
Please note: I do not feel strongly about this at all, just naively trying to be of some use.
I am no expert on random seeds, but my intuition says that these three aspects are incompatible, because if after every mice()
you return to the stage of the random number generator before this call, any attempt to reproducibility (specified in the global environment) will likely lead to identical results over different calls.
Similarly to @GBA19, I do not really understand why you would like to preserve the global seed when you called mice. To me, this seems a bit like asking rnorm()
to not affect the state of the random number generator. Still, there is apparently some interest in it, and a preserve.seed
parameter might provide a solution.
Since the local seed gave various unforeseen problems, mice 3.14.12
reverted seed setting behaviour to mice 3.13.10
. Local seeds are gone. Trust this solves the issue.
The change requested in #426 and implemented in #432 has the side-effect of having
mice
return the exactly same set of imputed values on each instantiation of the function.The following code example illustrates this (demonstrated with the mean for simplicity):
replicate(5, {imp <- mice(nhanes, m= 1, printFlag = FALSE); complete(imp)$bmi |> mean()} )
# [1] 27.144 27.144 27.144 27.144 27.144
Given the stochastic nature of multiple imputation this seems unreasonable. This may break some existing scripts---it certainly broke one of mine.
There is a simple workaround, adding a
runif(1)
, but it is preferable to havemice
work reasonably 'out of the box':replicate(5, {runif(1); imp <- mice(nhanes, m= 1, printFlag = FALSE, seed= 12345); complete(imp)$bmi |> mean()} )
# [1] 27.032 26.984 26.900 26.900 26.900
If indeed someone needs exact replications on different instantiations, the present behavior in
3.14.0
, the solution is as simple as the following:replicate(5, {runif(1); imp <- mice(nhanes, m= 1, printFlag = FALSE, seed= 12345); complete(imp)$bmi |> mean()} )
# [1] 26.408 26.408 26.408 26.408 26.408
Hopefully, this is something you are able to solve.
"R version 4.1.2 (2021-11-01)"
Package Version
"mice" "3.14.0"
"withr" "2.4.3"