HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
956 stars 83 forks source link

Functions in list not found in future #248

Open burchill opened 6 years ago

burchill commented 6 years ago

Hi, I've been having a problem when I pass in a named list of functions to a complicated set of nested functions in a relatively complicated future topology. At some point down in the future hierarchy, I iterate over the list of functions, applying them to some data.

However, when I do so, I generally encounter errors telling me that the functions in the list (or functions called in functions in the list, but have been defined in the global environment) are not found. Below is a simple demonstration of what I'm talking about:

library(future)
f2 <- function(x) "f2 function"
f1 <- function(x) paste0(f2(x), " ", x)
l <- list("first"=f1, "second"=f2)

plan(multisession)
output %<-% { l[["first"]]("a") }
output

I get the error: Error in paste0(f2(x), " ", x) : could not find function "f2". I'd really like to be able to programmatically use functions like this, as I do in other areas of R. If there are no easy fixes to the general problem and this is just a limitation of future, is there anything else I can do to get around it?

For example, I might be able to just specify them as globals (and keep passing them down through the hierarchy?), but the problem with that is that it seems that users can either let future decide the globals automatically or manually decide all globals themselves, which would be infeasible to me. If I can't do anything about the general problem, is there a way of adding globals to the automatically determined ones?

HenrikBengtsson commented 6 years ago

For example, I might be able to just specify them as globals (and keep passing them down through the hierarchy?), but the problem with that is that it seems that users can either let future decide the globals automatically or manually decide all globals themselves, which would be infeasible to me. If I can't do anything about the general problem, is there a way of adding globals to the automatically determined ones?

For this part of the problem, please see https://github.com/HenrikBengtsson/future/issues/227#issuecomment-416799033

However, when I do so, I generally encounter errors telling me that the functions in the list (or functions called in functions in the list, but have been defined in the global environment) are not found. Below is a simple demonstration [...]

This use case comes up at times, and often people have found workarounds, and often it is no-problem if when the code and those functions live in a package. But, I agree, it would be nice for this work out of the box. The solution would require to perform a scan of globals among already found globals - and recursively. It can be done, but it will be very expensive and should most likely not be enabled by default. I'll add it to the todo list of features to consider.

Related: There is already an optional framework for scanning for futures among globals, see options future.globals.resolve and future.resolve.recursive in ?future::future.options. This is not enabled by default, because adds a big overhead even when not needed.

HenrikBengtsson commented 6 years ago

UPDATE: In the next release (future 1.10.0), you will be able to manually specify an additional set of globals, e.g.

library(future)
plan(multisession, workers = 2)

f2 <- function(x) "f2 function"
f1 <- function(x) paste0(f2(x), " ", x)
l <- list("first"=f1, "second"=f2)

output %<-% { l[["first"]]("a") } %globals% structure(TRUE, add = "f2")
output
## [1] "f2 function a"

To test the develop version, install it as:

remotes::install_github("HenrikBengtsson/future@develop")

UPDATE 2018-10-26: future 1.10.0 is now on CRAN.