HenrikBengtsson / doFuture

:rocket: R package: doFuture - Use Foreach to Parallelize via Future Framework
https://doFuture.futureverse.org
84 stars 6 forks source link

BiocParallel DoParam: RNG warnings #58

Closed HenrikBengtsson closed 3 years ago

HenrikBengtsson commented 3 years ago

I'm create this issue here to track cases of downstream packages that does not use doRNG::%dorng% for generating parallel-safe random numbers, or if they indeed do, the future framework still produces a warning about it.

Examples

Package 'plyr'

doFuture::registerDoFuture()
y <- plyr::llply(1:2, rnorm, .parallel = TRUE)

Warning messages:
1: In setup_parallel() : No parallel backend registered
2: UNRELIABLE VALUE: One of the foreach() iterations ('doFuture-1') unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, use '%dorng%' from the 'doRNG' package instead of '%dopar%'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, set option 'future.rng.onMisuse' to "ignore".

Conclusion: This is a true RNG mistake. This happens because plyr does not use doRNG. Also, looking at the source code, there are no other attempts to use parallel-safe RNG.

Workaround: After doFuture::registerDoFuture(), call doRNG::registerDoRNG(), which will automatically turn all %dopar% to %dorng%, e.g.

doFuture::registerDoFuture()
doRNG::registerDoRNG()
y <- plyr::llply(1:2, rnorm, .parallel = TRUE)

To disable the RNG warnings, set:

options(future.rng.onMisuse = "ignore")

## Slightly better: in doFuture (>= 0.12.0) [next release]
options(doFuture.rng.onMisuse = "ignore")

Package 'BiocParallel'

doFuture::registerDoFuture()
BiocParallel::register(BiocParallel::DoparParam())
y <- BiocParallel::bplapply(1:2, rnorm)

Warning message:
UNRELIABLE VALUE: One of the foreach() iterations ('doFuture-1') unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, use '%dorng%' from the 'doRNG' package instead of '%dopar%'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, set option 'future.rng.onMisuse' to "ignore".

Conclusion: This is a false-positive because BiocParallel deploys L'Ecuyer-CMRG seeds internally. Although they're not invariant to the number of parallel workers(*), they're statistically sound. (*) There is work in progress for this, cf. https://github.com/Bioconductor/BiocParallel/pull/130.

To disable the false warnings, use:

options(future.rng.onMisuse = "ignore")

## Slightly better: in doFuture (>= 0.12.0) [next release]
options(doFuture.rng.onMisuse = "ignore")

Workaround: No workaround needed.

However, if one wants to have parallel RNG that is invariant to the number of workers, call doRNG::registerDoRNG() after doFuture::registerDoFuture(), which will automatically turn all %dopar% to %dorng%, e.g.

doFuture::registerDoFuture()
doRNG::registerDoRNG()
BiocParallel::register(BiocParallel::DoparParam())

future::plan("sequential")
set.seed(42)
y1 <- BiocParallel::bplapply(1:3, rnorm)

future::plan("multisession", workers = 2L)
set.seed(42)
y2 <- BiocParallel::bplapply(1:3, rnorm)
stopifnot(identical(y2, y1))

future::plan("multisession", workers = 3L)
set.seed(42)
y3 <- BiocParallel::bplapply(1:3, rnorm)
stopifnot(identical(y3, y2))