futureverse / future.apply

:rocket: R package: future.apply - Apply Function to Elements in Parallel using Futures
https://future.apply.futureverse.org
211 stars 16 forks source link

Clear up global options usage with nested plans #116

Closed maxim-h closed 11 months ago

maxim-h commented 1 year ago

Hi. And thank you for a great package!

future.globals.maxSize option limits the amount of RAM that can be moved between processes. Very useful feature.

But obviously sometimes you need to increase it like so:

options(future.globals.maxSize = [bigger number])

This breaks once you define a nested plan. For example

plan(list(tweak(multisession, workers = 4), tweak(multisession, workers = 8)))
options(future.globals.maxSize = [bigger number])

The future.globals.maxSize seems to only apply to the outer plan, but not the inner one. This was reported here. I couldn't find anything in the documentation on how to deal with such situation.

The only work around I could find is to define the option twice: once globally, and once inside the outer function that is being ran. Like this, for example:

plan(list(tweak(multisession, workers = 4), tweak(multisession, workers = 8)))
options(future.globals.maxSize = [bigger number])

future_lapply(
  outer_arg_list, 
  function(x) {
    options(future.globals.maxSize=[bigger number])

    future_lapply(
      inner_arg_list,
      function(y) {
        ...
      }
  }
  )

This seems inconvenient and error prone. Is there a better method of defining these options for nested futures? If not, perhaps there should be.

HenrikBengtsson commented 1 year ago

Hi,

thanks for this. This looks specific to the future.apply package, for which I can reproduce it. I cannot reproduce this for bare-bone futures, furrr, or doFuture.

Bare-bone futures [OK]

library(future)
plan(list(
  outer = tweak(multisession, workers = 2),
  inner = tweak(multisession, workers = 2)
))

options(future.globals.maxSize = 1234000)

f <- future({
  outer <- data.frame(
    label   = "outer",
    pid     = Sys.getpid(),
    maxSize = getOption("future.globals.maxSize", NA_real_)
  )

  f <- future({
    data.frame(
      label   = "inner",
      pid     = Sys.getpid(),
      maxSize = getOption("future.globals.maxSize", NA_real_))
  })
  inner <- value(f)

  rbind(outer, inner)
})
v <- value(f)
print(v)

gives

  label    pid maxSize
1 outer 778647 1234000
2 inner 778763 1234000

which shows that the R option is indeed carried down (as it should by design);

stopifnot(all(v$maxSize == getOption("future.globals.maxSize")))

future.apply [FAIL]

library(future.apply)

v <- future_lapply(1:2, FUN = function(x) {
  outer <- data.frame(
    label   = "outer",
    idx     = x,
    pid     = Sys.getpid(),
    maxSize = getOption("future.globals.maxSize", NA_real_)
  )

  inner <- future_lapply(3:4, FUN = function(x) {
    data.frame(
      label   = "inner",
      idx     = x,
      pid     = Sys.getpid(),
      maxSize = getOption("future.globals.maxSize", NA_real_))
  })
  inner <- do.call(rbind, inner)
  rbind(outer, inner)
})
v <- do.call(rbind, v)
print(v)

gives:

  label idx    pid maxSize
1 outer   1 778647      NA
2 inner   3 778763      NA
3 inner   4 778764      NA
4 outer   2 778646      NA
5 inner   3 779485      NA
6 inner   4 779484      NA

furrr [OK]

library(furrr)

v <- future_map(1:2, function(x) {
  outer <- data.frame(
    label   = "outer",
    idx     = x,
    pid     = Sys.getpid(),
    maxSize = getOption("future.globals.maxSize", NA_real_)
  )

  inner <- future_map(3:4, function(x) {
    data.frame(
      label   = "inner",
      idx     = x,
      pid     = Sys.getpid(),
      maxSize = getOption("future.globals.maxSize", NA_real_))
  })
  inner <- do.call(rbind, inner)
  rbind(outer, inner)
})
v <- do.call(rbind, v)
print(v)

gives

  label idx    pid maxSize
1 outer   1 778647 1234000
2 inner   3 778763 1234000
3 inner   4 778764 1234000
4 outer   2 778646 1234000
5 inner   3 779485 1234000
6 inner   4 779484 1234000

doFuture [OK]

library(doFuture)

v <- foreach(x = 1:2) %dofuture% {
  outer <- data.frame(
    label   = "outer",
    idx     = x,
    pid     = Sys.getpid(),
    maxSize = getOption("future.globals.maxSize", NA_real_)
  )

  inner <- foreach(x = 3:4) %dofuture% {
    data.frame(
      label   = "inner",
      idx     = x,
      pid     = Sys.getpid(),
      maxSize = getOption("future.globals.maxSize", NA_real_))
  }
  inner <- do.call(rbind, inner)
  rbind(outer, inner)
}
v <- do.call(rbind, v)
print(v)

gives

  label idx    pid maxSize
1 outer   1 778647 1234000
2 inner   3 778763 1234000
3 inner   4 778764 1234000
4 outer   2 778646 1234000
5 inner   3 779485 1234000
6 inner   4 779484 1234000
HenrikBengtsson commented 11 months ago

This has been fixed in the next version of future.apply. It was due to a typo; changing length(chunk) to length(chunks) in a few places fixed it. Doh!

HenrikBengtsson commented 10 months ago

FYI, future.apply 1.11.1, fixing this, is now on CRAN.

maxim-h commented 10 months ago

Thank you! Checked the new CRAN version. All works as expected.