HenrikBengtsson / future.apply

:rocket: R package: future.apply - Apply Function to Elements in Parallel using Futures
https://future.apply.futureverse.org
210 stars 16 forks source link

Fetching globals assigned as function arguments fails when run in parallel #36

Open arunsrinivasan opened 5 years ago

arunsrinivasan commented 5 years ago

The example shown below should be quite straightforward to follow. The parallel case doesn't seem to be able to find the global var x_a.

packageVersion("future")        # [1] ‘1.11.1.1’
packageVersion("future.apply")  # [1] ‘1.1.0’

require(future)
require(future.apply)

foo <- function(x=getOption("x_a", 2L), y=getOption("x_b", 5L)) x * y

plan(sequential)
future_lapply(1:2, function(i) foo())
# [[1]]
# [1] 10

# [[2]]
# [1] 10

plan(multisession, workers=2L)
future_lapply(1:2, function(i) foo())
# [[1]]
# [1] 10

# [[2]]
# [1] 10

options(x_a=5L)
plan(sequential)
future_lapply(1:2, function(i) foo())
# [[1]]
# [1] 25

# [[2]]
# [1] 25

plan(multisession, workers=2L)
future_lapply(1:2, function(i) foo())
# [[1]]
# [1] 10

# [[2]]
# [1] 10

I came across this behaviour while trying to understand the issue in my original scenario, which is slightly different. I've a package say, PKG, which has a function say, FOO as follows:

FOO <- function(x=MY_GLOBLALS$x, y=MY_GLOBALS$y) x * y

Then, the sequential case works fine but multisession / parallel version fails to find 'x' and 'y' defaults when run with the following code:

require(future)
require(future.apply)
require(PKG) # has function FOO
MY_GLOBALS <- list(x=2L, y=5L)
plan(multisession, workers=2L)
future_lapply(1:2, function(i) FOO()) # has to extract the default values from the variable MY_GLOBALS in the global environment.
# Error in FOO() : object 'MY_GLOBALS' not found

session info:

> sessionInfo()
# R version 3.3.2 (2016-10-31)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows Server >= 2012 x64 (build 9200)

# locale:
# [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
# [4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] future.apply_1.1.0 future_1.11.1.1   

# loaded via a namespace (and not attached):
# [1] parallel_3.3.2   tools_3.3.2      listenv_0.7.0    codetools_0.2-15 digest_0.6.14    globals_0.12.4  
arunsrinivasan commented 5 years ago

Passing future.globals="MY_GLOBALS" works. But I've a feeling that this is a case where it should work out-of-the-box..

HenrikBengtsson commented 5 years ago

There a few things going on here. First, using R options within futures is not (yet) support - not even manually. Example:

> library(future)
> plan(multisession, workers = 2)
> options(my_option = "yo")
> a <- getOption("my_option")
> a
[1] "yo"
> b %<-% getOption("my_option")
> b
NULL

It's on the wishlist (https://github.com/HenrikBengtsson/future/issues/172). The main problem of supporting that automatically is that some R options are read-only and other are meant only for the local R session, i.e. it's not obvious what options should be exported to the future and not.

HenrikBengtsson commented 5 years ago

Second, the design pattern where a package PKG provides:

FOO <- function(x=MY_GLOBLALS$x, y=MY_GLOBALS$y) x * y

where

MY_GLOBALS <- list(x=2L, y=5L)

should live in the users global environment is a bit unusual. The exteme example of this would be a package providing:

foo <- function(x = 1) {
  a * x
}

where it expects the user to have a in the global environment. I don't think I've attempted to cover it (if I would then this is should have been asserted by a package test). BTW, did this code example mean to reflect R options and .Options? If so, it makes a bit more sense to me.

HenrikBengtsson commented 5 years ago

Forgot to say about R options: See also https://github.com/HenrikBengtsson/future.apply/issues/16 ('options("test"=5) are not global'). It provides a link to https://github.com/HenrikBengtsson/future/issues/134#issue-214818468 ('R options: options() are not passed down to futures - should they? (document either way)') which gives a few examples on how to manually "export" R options.

HenrikBengtsson commented 5 years ago

@arunsrinivasan, have you had a chance to digest/think more about my follow-up comments? I'd like to find which of your use cases are already on the future roadmap and which are new.

I'd also like to see if there's anything specific to future.apply here - I don't think so - my understanding is that all your needs will have to be taken care of by the core future framework.

arunsrinivasan commented 5 years ago

Apologies for the late reply. I agree the design of function argument with default like MY_GLOBALS$x is very unusual. And I'm working on fixing such code that I inherited. It'd still be nice that things work in those weird cases, as it should though. No I didn't mean it to reflect .options. This is really a code that I inherited and took me a while to figure out why code using future wasn't working. Then I tried a quick fix in the most reasonable way I could think of - using getOption and found it didn't work as well.

I'd be happy if globals could work as intended.

I agree this has more to do with future than future.apply. Feel free to migrate the issue there, if necessary.