HenrikBengtsson / parallelly

R package: parallelly - Enhancing the 'parallel' Package
https://parallelly.futureverse.org
128 stars 7 forks source link

`rscript_startup` doesn't appear to do anything #68

Closed yogat3ch closed 2 years ago

yogat3ch commented 2 years ago

Hi @HenrikBengtsson, Would you be willing to take a look at the rscript_startup argument when passed via makeClusterPSOCK? It doesn't appear to do anything.

A reprex is here:

cl <- parallelly::makeClusterPSOCK(
  1,
  port = 5302,
  autoStop = TRUE,
  rscript_startup = rlang::expr({
    test = 2
  })
)
future::plan(future::cluster, workers = cl)
promises::future_promise(message(ls(all.names = TRUE)))
sessionInfo ``` R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices datasets utils methods [7] base loaded via a namespace (and not attached): [1] compiler_4.1.0 parallelly_1.28.1 fastmap_1.1.0 [4] magrittr_2.0.1 R6_2.5.1 later_1.2.0 [7] promises_1.2.0.1 parallel_4.1.0 tools_4.1.0 [10] listenv_0.8.0 Rcpp_1.0.7 codetools_0.2-18 [13] digest_0.6.27 globals_0.14.0 rlang_0.4.11 [16] renv_0.14.0 future_1.22.1 ```
HenrikBengtsson commented 2 years ago

It should definitely work, e.g.

> cl <- parallelly::makeClusterPSOCK(1L, rscript_startup = "x <- 42")
TRACKER: loadedNamespaces() changed: 1 package loaded ('rstudioapi')
> parallel::clusterEvalQ(cl, { 2*x })
[[1]]
[1] 84

> cl <- parallelly::makeClusterPSOCK(1L, rscript_startup = quote(x <- 3.14))
> parallel::clusterEvalQ(cl, { 2*x })
[[1]]
[1] 6.28

Now, when using futures, it's a different story. Futures, including cluster futures, wipe the global environment of the workers, including when plan() is set up, e.g.

> cl <- parallelly::makeClusterPSOCK(1L, rscript_startup = quote(x <- 3.14))
> parallel::clusterEvalQ(cl, { 2*x })

> future::plan("cluster", workers = cl)  ## wipes the global environment
> parallel::clusterEvalQ(cl, { 2*x })
Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  one node produced an error: object 'x' not found

Also, each use of future() will wipe the global environment of the worker. This is intentional and by design of the Future API. It's a core, essential feature. Futures must not make assumptions about the worker it happens to end up on. If they did, then we would break the requirement that the end-user should be able to switch plan() as they'd like.

Having said this, it's on the road map to allow for "sticky" globals and/or to cache globals on workers. See https://github.com/HenrikBengtsson/future/issues?q=is%3Aopen+is%3Aissue+label%3Afeature%2Fsticky-globals. Many of those requests are related to what I think are your expectations here.

yogat3ch commented 2 years ago

Ahh, that is helpful to know! Thank you for clarifying what was going on there.

yogat3ch commented 2 years ago

Yes, sticky-global is indeed what I was erroneously inferring rscript_startup was doing! (Or should be doing)