HenrikBengtsson / doFuture

:rocket: R package: doFuture - Use Foreach to Parallelize via Future Framework
https://doFuture.futureverse.org
84 stars 6 forks source link

Adverb to make other foreach-based functions futurized #15

Closed richierocks closed 5 years ago

richierocks commented 6 years ago

Suppose you have a function that uses foreach.

foreach_rnorm <- function(n, mu, sigma) {
  registerDoFuture()
  plan(multiprocess)
  foreach(i = seq_len(n), .export = c("mu", "sigma")) %dopar% {
    rnorm(i, mean = mu, sd = sigma)
  }
}

This forces the user to use the multiprocess strategy. It would be nice if the function let the user pass the strategy to plan(). The proper logic is likely to be even more complicated than this but this hopefully communicates the idea.

foreach_rnorm2 <- function(n, mu, sigma, plan_strategy = c("sequential", "transparent", "multisession", "multicore", "cluster", "remote"), n_workers = 1) {
  plan_strategy <- match.arg(plan_strategy)
  registerDoFuture()
  if(plan_strategy == "cluster") {
  cl <- makeCluster(n_workers)
    plan(plan_strategy, substitute = FALSE, workers = cl)
  } else {
    plan(plan_strategy, substitute = FALSE)
  }
  plan(plan_strategy, substitute = FALSE, workers = cl)
  foreach(i = seq_len(n), .export = c("mu", "sigma")) %dopar% {
    rnorm(i, mean = mu, sd = sigma)
  }
}

The big problem with foreach_rnorm2() is that now there is a lot of future boilerplate code mixed in with the function logic. Ideally this would be separated out. I think that an adverb, similar to purrr::safely(), is the cleanest option. That is, you want the user to be able to write

new_fn <- futurize(fn)

Or in this specific case

futurized_foreach <- futurize(
  foreach_rnorm <- function(n, mu, sigma) {
    foreach(i = seq_len(n), .export = c("mu", "sigma")) %dopar% {
      rnorm(i, mean = mu, sd = sigma)
    }
  }
)

I think the implementation is a more complex version of

futurize <- function(.f) {
  .f <- purrr::as_mapper(.f)
  function(..., plan_strategy = c("sequential", "transparent", "multisession", "multicore", "cluster", "remote"), n_workers = 1) {
    plan_strategy <- match.arg(plan_strategy)
    registerDoFuture()
    if(plan_strategy == "cluster") {
      cl <- makeCluster(n_workers)
      plan(plan_strategy, substitute = FALSE, workers = cl)
    } else {
      plan(plan_strategy, substitute = FALSE)
    }
    .f(...)
  }
}

Does this sound like something worth pursuing?

HenrikBengtsson commented 6 years ago

The philosophy of the future package/Future API is that as a developer you cannot make any assumptions on what type of computational resources the end user will have access to. Because of this, it should be up to the end user to control the future strategy. This basically also means one shouldn't limit what type of futures can be used by hard coding it into, say, a package. Who know, there might be a new amazing future backend showing up years from now and then it's a petty if it cannot be used.

With all that said, is there a reason why you don't want to use?

# End user controls this
registerDoFuture()
plan(multiprocess)

# Developer controls this
foreach_rnorm <- function(n, mu, sigma) {
  foreach(i = seq_len(n), .export = c("mu", "sigma")) %dopar% {
    rnorm(i, mean = mu, sd = sigma)
  }
}

There might be valid reason for incorporating plan() calls internally in the code, but avoiding it should be the goal. From my experience, this is often possible to do. Also, if setting plan() internally, it's important to undo any such changes because the code must also work if already called by other code than might already be running in parallel and/or rely on the future framework.

HenrikBengtsson commented 5 years ago

I'm closing this one because I don't think it belongs here in doFuture. It might fight over at the future package - please reopen there if you want.