DavisVaughan / furrr

Apply Mapping Functions in Parallel using Futures
https://furrr.futureverse.org/
Other
695 stars 39 forks source link

Getting the same random result across `purrr::map()` and `furrr::future_map()` #251

Open DanChaltiel opened 1 year ago

DanChaltiel commented 1 year ago

Hi,

I am running simulations where some computing should be parallelized and some should not, and I am trying to figure out how to ensure that purrr::map() and furrr::future_map() yield the same result for a given seed.

Reading the help of furrr_options(), I'm trying to figure out how to pass the signs to future_map(), but this problem is not addressed specifically.

For instance, consider the following code:

library(purrr)
library(furrr)

set.seed(42)
rnorm(2)
#> [1]  1.3709584 -0.5646982

set.seed(42)
map_dbl(1:2, ~rnorm(1))
#> [1]  1.3709584 -0.5646982

set.seed(42)
future_map_dbl(1:2, ~rnorm(1), .options=furrr_options(seed = list(.Random.seed, .Random.seed)))
#> [1] 1.370958 1.370958

set.seed(42)
future_map_dbl(1:2, ~rnorm(1), .options=furrr_options(seed = list(.Random.seed[1:10], .Random.seed[2:11])))
#> Error in sample.int(n = 1L, size = 1L, replace = FALSE): '.Random.seed' has wrong length

Created on 2023-02-22 with reprex v2.0.2

Using seed=FALSE, seed=TRUE, seed=NULL, or even seed=42L yielded different results (or errors), but none was right.

Is there a way to pass the right seed to each parallel iteration of furrr::future_map() so that it yields the same result as purrr::map()?

If yes, it might be worth clarifying in the documentation, and if not, could this be considered a new feature?

HenrikBengtsson commented 1 year ago

This question was also asked at https://stackoverflow.com/q/75521357/1072091. I've answered it at https://stackoverflow.com/a/75543379/1072091.

DanChaltiel commented 1 year ago

Thank you very much for your answer on SO, this was very helpful and interesting.

I still think this should be mentioned in the documentation though.

DavisVaughan commented 1 year ago

The docs in ?furrr::future_options() for Reproducible random number generation (RNG) mention that random numbers are the same regardless of the parallel backend, but dont mention that it would be different from purrr::map(), so I can add a sentence about that