HenrikBengtsson / doFuture

:rocket: R package: doFuture - Use Foreach to Parallelize via Future Framework
https://doFuture.futureverse.org
84 stars 6 forks source link

how to use plan() in package #49

Closed jkurle closed 4 years ago

jkurle commented 4 years ago

First of all thank you so much for providing this package, @HenrikBengtsson!

I am currently writing my own package and want to use the doFuture (future), foreach, and doRNG packages in a foreach loop. I have read a few of the issues on this topic (#375, HenrikBengtsson/future#274, HenrikBengtsson/future#327, https://github.com/HenrikBengtsson/doFuture/issues/42, https://github.com/HenrikBengtsson/doFuture/issues/41) but am still unsure whether I understood it correctly. I would be grateful for any feedback. I have put my questions and reasoning as comments into the code:

## a function in my package
monte_carlo_function <- function(replications, seed, true_parameters, model_setting) {
  ## register parallel adapter here?
  registerDoFuture()
  ## set L'Ecuyer-CMRG to avoid warning when user selects plan(sequential), see HenrikBengtsson/doFuture#42
  RNGkind("L'Ecuyer-CMRG")
  ## using %dorng% and .options.RNG to set seed instead of registerDoRNG(), see HenrikBengtsson/doFuture#41
  results <- foreach(m = (1:replications), .combine = "rbind", .options.RNG = seed) %dorng% {
      data <- generate_random_data(true_parameters)
      model <- run_model(model_setting)
      interesting_stuff <- calculate_statistics(model)
      data.frame(interesting_stuff)
  }
  return(results)
}

The user would then have to specify a plan before using monte_carlo_function() from my package, giving them complete control what backend to use. Example:

plan(cluster, workers = availableCores())
results <- monte_carlo_function(100, 123, params, settings)

This would mean that the user first specifies a plan and then the function registers the parallel adapter via registerDoFuture(). This seems to be working but the example in the doFuture documentation uses the reverse order (which makes sense). Should registerDoFuture also be done by the user manually and not in the function?

HenrikBengtsson commented 4 years ago

Hi. So, this becomes a bit philosophical and the answer depends a bit on how foreach and %dopar% is really meant to be used.

A conservative approach that I would take is to leave the registration of the foreach adaptor also to the end-user. That way they can also use, say, doParallel or doMC. That approach would require you to make sure all globals and packages are explicitly declared in the foreach() call.

OTH, if your package is targeting futures only, e.g. you might switch to code with future.apply or furrr later or elsewhere in the code, then I think it is ok to use registerDoFuture() like you do. What's missing though is a way to undo your registration. For instance, imagine that the use already registered doMC outside for other purposes. They be confused when all of a sudden that is undone. Please reach out to the foreach maintainer for this question/feature request (https://github.com/RevolutionAnalytics/foreach/issues).

Regarding doFuture issue HenrikBengtsson/future#42: doRNG had made changes allowing me to workaround that inside doFuture, meaning that hack will not be needed in the next release

jkurle commented 4 years ago

Thank you very much for your explanations! I will use your conservative approach and explain what the user should do in a little vignette. It seems sensible to allow for other adaptors as well.

HenrikBengtsson commented 4 years ago

I migrated this issue to doFuture in case others search for this over there.

HenrikBengtsson commented 4 years ago

FYI, doFuture 0.10.0 is now on CRAN removing the need for:

## set L'Ecuyer-CMRG to avoid warning when user selects plan(sequential), see HenrikBengtsson/doFuture#42
RNGkind("L'Ecuyer-CMRG")