HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
956 stars 83 forks source link

R options: options() are not passed down to futures - should they? (document either way) #134

Open HenrikBengtsson opened 7 years ago

HenrikBengtsson commented 7 years ago

I'm adding this as a reminder to myself to document this (next release), but also to consider more automated solutions (a future release):

R's global options() are not passed down to futures. This has to be done manually right now, e.g.

fopts <- options()[c("digits", "repos")]
f <- future({
  options(fopts)
  [...]
})

This works, but it would be nice to have an automated version of this as well (cf. globals and packages) to better capture the state of the calling R process. Doing this might be a bit tricky since some options are read only / shouldn't be changed and others are specific the local machine, which may not work on a remote machine. Maybe it's possible to use static-code inspection to identify what options are used (similar to how globals are identified).

HenrikBengtsson commented 6 years ago

Some ideas:

Ignoring any "automatic" passing down of options, this could be done by adding an argument options analogue to globals, e.g.

f <- future(..., options = c("digits", "repos"))

That construct provides the developer control over options. For the end user to control options, there would need to be a similar argument that can be specified when setting the plan(), e.g.

plan(batchtools_sge, workers = 100L, options = c("digits", "repos"))

Options specified via plan() will always be set/passed down and additional ones specified via future() would be done in addition to what's in plan().

rubenarslan commented 6 years ago

So, future.global.maxSize is also not passed down. That leaves me in a pickle. I send jobs to a Torque cluster from my local. The jobs call brms, which has a future = TRUE argument, so that each of its chains is run according to plan. But these chains are bigger than 500Mb so this fails and I cannot change it, because I cannot alter options in the futures that brms creates. I think all future options should always be passed down.

HenrikBengtsson commented 6 years ago

Good point, I agree that future.globals.maxSize should be passed down - this is an oversight. As a workaround for now, you can override the default in the ~/.Rprofile on the machine in the chain that complains, e.g.

options(future.globals.maxSize = 1.0 * 1024^3)
HenrikBengtsson commented 6 years ago

Moving the discussion of future.* options to Issue #223 because it's a much more narrow problem than considering any/all of R's options.

HenrikBengtsson commented 6 years ago

FYI/@rubenarslan, in the next release (develop branch right now), we're also passing down:

HenrikBengtsson commented 6 years ago

FYI, future 1.9.0, which passes down several more future.globals.* options, is now on CRAN.

Rloveeee commented 6 years ago

I don't see any difference, did I do something wrong ?

sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C LC_TIME=French_France.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] future.apply_1.0.0 future_1.9.0

loaded via a namespace (and not attached): [1] compiler_3.5.0 parallel_3.5.0 tools_3.5.0 listenv_0.7.0 codetools_0.2-15 digest_0.6.15 globals_0.12.1

options("test"=10)
FUN=function(x) return(getOption("test"))
xx=future.apply::future_lapply(X=list(0),FUN=FUN)
xx

[[1]] NULL

HenrikBengtsson commented 6 years ago

Nah, please read the above comments again. future 1.9.0 only pass down some specific future.globals.* options. The task on passing done options automagically is still an open question.

Rloveeee commented 6 years ago

Ok, I have read the comment but I just understand it now. And I couldn't understand the manual fix: fopts <- options()[c("digits", "repos")] f <- future({ options(fopts) [...] }). I tried f <- future({ options(fopts) },FUN=FUN,X=list(0)) and several others stuff, but I can't make it.

HenrikBengtsson commented 6 years ago

Here's an example:

library(future)
plan(multisession) # Use background R sessions

## My personal option
options(test = 5)

## Use only 3 digits
options(digits = 3L)

## Options that I want to pass down to the futures/workers
fopts <- options()[c("digits", "test")]

str(fopts)
# List of 2
#  $ digits: int 3
#  $ test  : num 5

# Create a future using those "frozen" 'fopts' options
> f <- future({
+   ## Set 'fopts' options (here fopts is a global variable)
+   options(fopts)
+   ## Show that those options are set
+   cat(sprintf("test = %s\n", getOption("test")))
+   print(pi)
+   2 * getOption("test")
+ })

> v <- value(f)  ## requires future 1.9.0 to see output
test = 5
[1] 3.14

> v
[1] 10

I tried f <- future({ options(fopts) },FUN=FUN,X=list(0)) and several others stuff, but I can't make it.

Are you asking how to do the same with future.apply::future_lapply()?

Rloveeee commented 6 years ago

Are you asking how to do the same with future.apply::future_lapply()? YES ! I understand your example but I don't see how I could transfer it to future_lapply.

Thanks

HenrikBengtsson commented 6 years ago

The same way, via global fopts, e.g.

> library(future.apply)
> plan(multiprocess)

> options(test = 5)
> options(digits = 3L)
> fopts <- options()[c("digits", "test")]
> y <- future_lapply(1:3, FUN = function(x) {
+   ## Set 'fopts' options (here fopts is a global variable)
+   options(fopts)
+   ## Show that those options are set
+   cat(sprintf("test = %s\n", getOption("test")))
+   print(pi)
+   getOption("test") * x
+ })
test = 5
[1] 3.14
test = 5
[1] 3.14
test = 5
[1] 3.14

> unlist(y)
[1]  5 10 15
> 
Rloveeee commented 6 years ago

Thanks a lot. I got another one:

library("future.apply") plan(multiprocess, workers = 12) test=function(...){ UseMethod("test") } test.numeric=function(x){ return(x) } test.character=function(x){ return(x) } FUN=function(x) { test(x) } xx=future.apply::future_lapply(FUN=FUN,X=c(2,3))

I also have another one but I can't (yet) reproduce it on small code. (It can't find roll_sd from RcppRoll).

HenrikBengtsson commented 6 years ago

Please create a separate issue if you're changing topics

adamaltmejd commented 4 years ago

Hello! Just stumbled upon this when my code started behaving strangely and realized after a few hours of debugging that the "memory_saving_mode" option had defaulted to TRUE because future wasn't passing options on. Maybe could be suitable with a warning about this?

I'm using future version 1.11.1.1 through Drake, calling future:plan(future::multiprocess). What is the recommended way to pass options to workers?

Rloveeee commented 4 years ago

Hello Adam,

I have used a workaround. I am reloading each worker independently. More CPU/bandwith, less work :)

Best

Le mer. 30 oct. 2019 à 12:23, Adam Altmejd notifications@github.com a écrit :

Hello! Just stumbled upon this when my code started behaving strangely and realized after a few hours of debugging that all the "memory_saving_mode" options had defaulted to TRUE because future wasn't passing options on. Maybe could be suitable with a warning about this?

I'm using future version 1.11.1.1 through Drake https://github.com/ropensci/drake, calling future:plan(future::multiprocess). What is the recommended way to pass options to workers?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/HenrikBengtsson/future/issues/134?email_source=notifications&email_token=AEGAXTQ4PX2LR6AVQR6TKATQRFVDXA5CNFSM4DD7FPPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECTZU4Q#issuecomment-547854962, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEGAXTUMAV3NL23YANAD45TQRFVDXANCNFSM4DD7FPPA .

HenrikBengtsson commented 4 years ago

... the "memory_saving_mode" option had defaulted to TRUE because future wasn't passing options on. Maybe could be suitable with a warning about this?

So, I'm objecting that a warning could be useful but it's unfortunately not feasible. How would future know what options to warn about and not? Feel free to clarify how you want it to work.

I'm using future version 1.11.1.1 ...

I highly recommend that you update - you are way behind and you're missing out on bug fixes and features. Is there a reason why you're not updating?

... What is the recommended way to pass options to workers?

You can do it via a global variable, e.g.

my_opts <- options(c("memory_saving_mode", "digits"))
f <- future({
  options(my_opts)
  ...
})
v <- value(f)
adamaltmejd commented 4 years ago

Tack :).

You are probably right. But also package options could create problems if they are changed from their defaults. Same for base R options that are changed. One possibility is to issue a warning when the package is loaded, but that might be too extreme. I guess the best would be to pass options on to works by default and then allowing the user to shut that down in the function call.

As I was saying I'm calling future via drake so am not explicitly creating any "future" objects. Only setting plan(multiprocess) and then run drake::make(parallelism = "future") so not sure how to add global variables like you suggested - but I guess thats a question for Drake.

tdeenes commented 4 years ago

Maybe the problem of environment variables could be added to this thread.

tdeenes commented 4 years ago

Identifying automagically which options and environment variables should be passed down to workers is hard, indeed.

I would like to add a further idea as well - not a silver bullet, but sometimes useful:

  1. In .onLoad(), try to figure out what the default options/environment variables in the host and in the worker are, and store them in .options_at_startup. Flag those options which a) do not exist in the worker, but exist in the host, and b) those which do differ between the worker and the host.

  2. When a future is created, check the actual options and environment variables, compare with .options_at_startup, and inject the relevant options and env variables to the future.

This is of course a rude approximation, can result in both false positives and false negatives.

bakaburg1 commented 3 months ago

Probably silly question for @HenrikBengtsson: why not just pass all the options down at worker creation? it should not be much of overhead isn't it?

scottkosty commented 3 months ago

side note (not related to previous comment's question): I wonder if a useful comparison is to Bash options and sub-shells. For example, in bash if you run set -x (or set -e), that option is not respected in the subshells unless the user explicitly does something like export SHELLOPTS, which somehow causes the subshells to all have the same options.

Reference.

HenrikBengtsson commented 3 months ago

Probably silly question for @HenrikBengtsson: ...

You broke Rule No 1; there are no silly questions - just silly answers.

why not just pass all the options down at worker creation? it should not be much of overhead isn't it?

For a few different reasons, but one issue is "discussed" in https://github.com/HenrikBengtsson/parallelly/issues/70.

Until then, you can use argument rscript_startup to launch PSOCK workers using any R code you'd like, e.g.

> cl <- parallelly::makeClusterPSOCK(2, rscript_startup = quote(options(hello = "world")))
> str(parallel::clusterEvalQ(cl, getOption("hello")))
List of 2
 $ : chr "world"
 $ : chr "world"

Or similarly,

> library(future)
> plan(multisession, workers = 2, rscript_startup = quote(options(hello = "world")))
> f <- future(getOption("hello"))
> value(f)
[1] "world"

A more important design reason is: If your code depends on getOption("hello") being available on parallel workers, then it would fail on backends that don't support setting options on the workers. It's not clear to me how to handle this, so I take a very conservative approach in Futureverse and say "try to write your code such that you assume fresh, vanilla R workers".

A safer design is probably better to pass down R options via the future(...) - that is the background behind this issue. One reason for not yet implementing that is that I want to introduce "resource specifications", where passing down options would be more natural (I think). OTH, that is a complex problem, so maybe I should just add support for an future(..., options = ...) argument until that is in place.