HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
946 stars 82 forks source link

do.call and future::plan #639

Closed gitdemont closed 1 year ago

gitdemont commented 1 year ago

It seems calling future::plan inside do.call does not allow to pass extra parameters to strategy through ... Here is a minimal reprex considering that you have at least more than 2 cores

## call with workers inside multisession, OK
future::plan(strategy = future::multisession(workers = 2))
future::nbrOfWorkers() # 2

## call with workers as extra ... argument of plan, OK
future::plan(strategy = future::multisession, workers = 2)
future::nbrOfWorkers() # 2

## call within do.call, not working
do.call(what = future::plan, args = list(strategy = future::multisession, workers = 2))
future::nbrOfWorkers() # future::availableCores() instead of 2

## call within do.call, produces error (which is OK)
do.call(what = future::plan, args = list(strategy = future::multisession(workers = 2)))

sessionInfo()

R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.1.3    parallelly_1.32.0 parallel_4.1.3    tools_4.1.3       listenv_0.8.0     codetools_0.2-18  digest_0.6.29     globals_0.15.1   
[9] future_1.27.0 
HenrikBengtsson commented 1 year ago

Thanks for reporting on this. It's been fixed for the next release. Until then, one can use either of:

do.call(what = future::plan, args = list(strategy = "future::multisession", workers = 2))
do.call(what = future::plan, args = list(strategy = quote(future::multisession), workers = 2))
gitdemont commented 1 year ago

Thanks for the tips. I just add it and I can confirm that it solved the issue with worker. However, I think it's worth reopening here because it generates issues with other extra parameters

do.call(what = future::plan, args = list(strategy = quote(future::multisession), workers = 2, lazy = TRUE))
# Error: Detected arguments that must not be set via plan() or tweak(): ‘lazy’

PS: if it can help, in my use case I got error with: ‘envir’, ‘packages’, ‘seed’, ‘lazy’, ‘globals’ lazy and envir are explicitely parameters from future::multisession whereas the others should be passed thanks to ...

This happens with former version of {future} using the quote trick and also the current one just installed with remotes::install_github("HenrikBengtsson/future")

HenrikBengtsson commented 1 year ago

PS: if it can help, in my use case I got error with: ‘envir’, ‘packages’, ‘seed’, ‘lazy’, ‘globals’ lazy and envir are explicitely parameters from future::multisession whereas the others should be passed thanks to ...

That's intentional by design, e.g.

> plan(multisession, workers = 2, lazy = TRUE)
Error: Detected arguments that must not be set via plan() or tweak(): 'lazy'

None of those arguments must be used by plan(). plan() is controlled by the end-user, and if the end user sets any of those, they will change the behavior of the code that the developer had in mind. So, those arguments may only be passed to future() and alike.

gitdemont commented 1 year ago

I should say that you are right and what I am trying to achieve is close to what is described in https://github.com/HenrikBengtsson/progressr/issues/78

However, the documentation of future::plan says that ... (if I understand well) could be used to pass argument to the evaluation function i.e. strategy.

Therefore, as lazy and envir are arguments of future::multisession (alike workers), the reasoning according to the documentation is that they should not induce error when called such a way.

# the folowing works:
future::multisession(workers = 2, lazy = TRUE, envir = new.env())
# but despite the doc, the folowing produces an error
do.call(what = future::plan, args = list(strategy = future::multisession, workers = 2, lazy = TRUE, envir = new.env()))

By extension, the documentation of future::multisession says that ... should allow to pass arguments to future::Future, so extra args passed from future::plan should go to future::multissesion and then future::Future (unless prior matched)

As a consequence, while doable with workers, both functions (future::plan and future::mutisession) are incompatible when called within do.call and with other arguments