HenrikBengtsson / doFuture

:rocket: R package: doFuture - Use Foreach to Parallelize via Future Framework
https://doFuture.futureverse.org
84 stars 6 forks source link

automatic export fails when variable is only used in a glue expression #50

Closed jakejh closed 3 years ago

jakejh commented 3 years ago

The issue affects doFuture, but not doParallel. Here is a reprex:

library(doFuture)
#> Loading required package: foreach
#> Loading required package: future
library(doParallel)
#> Loading required package: iterators
#> Loading required package: parallel
library(glue)

w = 'thanks'
x = c('ben', 'jerry')

registerDoParallel()
y = foreach(i = x) %dopar% {glue('{w}, {i}')}

registerDoFuture()
plan(multisession)
y = foreach(i = x) %dopar% {glue('{w}, {i}')}
#> Error in {: task 1 failed - "object 'w' not found"

y = foreach(i = x, .export = 'w') %dopar% {glue('{w}, {i}')}

Created on 2020-10-02 by the reprex package (v0.3.0)

HenrikBengtsson commented 3 years ago

Hi. First, that is not a valid comparison. From your claim, I conclude you're on Linux or macOS, where using:

registerDoParallel()

will use forked parallel workers. Forked parallel processing is very special (and not always stable). If you'd do that also for futures;

registerDoFuture()
plan(multicore)

then

y <- foreach(i = x) %dopar% {glue('{w}, {i}')}

will work.

For user on MS Windows,

registerDoParallel()

will use PSOCK cluster workers, which corresponds to using:

registerDoFuture()
plan(multisession)

You can get the same behavior on Linux and macOS by specifying:

cl <- parallel::makeCluster(parallel::detectCores())
registerDoParallel(cl)

Using that will give:

> y <- foreach(i = x) %dopar% {glue('{w}, {i}')}
Error in { : task 1 failed - "could not find function "glue""

which is because the glue package is not attached on the worker. To fix this, use:

> y <- foreach(i = x, .packages = "glue") %dopar% {glue('{w}, {i}')}
Error in { : task 1 failed - "object 'w' not found"

As you see, global 'w' is not found either. Importantly, there is no reasonable way to infer that w and i are objects needed in the parallel code; they are only mentioned in strings in a way that only the glue package knows how to interpret. This is explained in future vignette 'Common Issues with Solutions'.

jakejh commented 3 years ago

Ah sorry, I didn't see the vignette. Thanks.