Closed koenniem closed 3 years ago
Thanks for reporting. Can you please provide a minimal reproducible example for this that I can work with?
Sure thing.
Consider we have two functions:
foreach_fun
which implements foreach
and does 10 loops over a list of 10 letters.collapse
takes a list of letters from foreach_fun
and uses paste
to collapse it. In reality this function can do anything, the error persists.foreach_fun <- function() {
foreach::foreach(i = 1:10, .noexport = "data") %dopar% {
data <- letters[1:10]
collapse(data)
}
}
collapse <- function(data) {
data <- paste0(data, collapse = ",")
data
}
registerDoFuture()
plan(future::multisession)
foreach_fun()
plan(sequential)
And this works fine. Now, suppose we have a variable in our global environment called data
for an entirely unrelated reason.
# Make a very large data frame, 763 MB in size
# Not used for anything
data <- as.data.frame(matrix(nrow = 1e6, ncol = 200))
If you run foreach_fun
again, you get the following error
Error in getGlobalsAndPackages(expr, envir = globals_envir, globals = TRUE) : The total size of the 3 globals that need to be exported for the future expression (‘{; doFuture::registerDoFuture(); lapply(seq_along(...future.x_ii), FUN = function(jj) {; ...future.x_jj <- ...future.x_ii[[jj]]; i <- NULL; ...future.env <- environment(); ...; }, error = identity); }); }’) is 762.95 MiB. This exceeds the maximum allowed size of 500.00 MiB (option 'future.globals.maxSize'). There are three globals: ‘data’ (762.95 MiB of class ‘list’), ‘collapse’ (5.81 KiB of class ‘function’) and ‘...future.x_ii’ (0 bytes of class ‘NULL’).
For reasons that are beyond me, doFuture
tries to export data
as well because it sees that it is used by collapse
, even though this is not the same variable of course. In the process, the .noexport
argument is ignored even though it should prevent data
from being exported. Moreover, export collapse
explicitly using .export
does not prevent this either.
Two adapters, doParallel
and doSNOW
don't have this problem, probably because they look for variables to export automatically like doFuture
does.
cl <- parallel::makeCluster(parallel::detectCores())
doParallel::registerDoParallel(cl)
# doSNOW::registerDoSNOW(cl) # also works
foreach_fun <- function() {
foreach::foreach(i = 1:10, .export = "collapse") %dopar% {
data <- letters[1:10]
collapse(data)
}
}
foreach_fun()
doParallel::stopImplicitCluster()
registerDoSEQ()
parallel::stopCluster(cl)
My intuition is that .noexport
is ignored by doFuture
. For example, in the function getGlobalsAndPackages_doFuture
in both globals.R
and doFuture.R
, the .noexport
function is taken as an argument but then nothing is done with it.
Thanks again for reporting on this. Indeed, the doFuture adapter completely ignored the .noexport
argument. This has been fixed in the develop branch, cf. https://github.com/HenrikBengtsson/doFuture/commit/974e61fb5bcfefe933c9a7262c5160cb337b7f25. Until next release is on CRAN, you can grab the develop branch version using:
remotes::install_github("HenrikBengtsson/doFuture", ref="develop")
Now, it turns out there's another problem too unrelated to the doFuture package. The data
object is incorrectly identified as a global variable, which it shouldn't. This needs to be fixed in the globals package - tracking it in https://github.com/HenrikBengtsson/globals/issues/69. When fixed there, you should no longer have to turn to .noexport
as a workaround. The bug there is that despite argument data
, the globals package still thinks data
in the function body is a global variable;
collapse <- function(data) {
data <- paste0(data, collapse = ",")
data
}
If you switch to, say,
collapse <- function(data) {
data2 <- paste0(data, collapse = ",")
data2
}
then data
is no longer picked up as a global variable and you don't need that .noexport = "data"
. There have been a few other globals reports related to this problem in the past.
FYI, globals 0.14.0 on CRAN. Fixes the problem where data
is incorrectly picked up as a global and exported. So, with the new globals version there's no need for .noexport
.
doFuture 0.11.0, which acknowledges argument .noexport
, is now on CRAN.
As far as I can see, the
.noexport
argument offoreach::foreach
is ignored bydoFuture
.This issue was brought to my attention when it automatically tried to export a variable called
data
in my global environment to a function which uses a variable calleddata
(but is not the same). While you could technically let it export it without serious consequences, this variable is 700Mb and is thus inefficient. Using the.noexport = "data"
did not have any effect when using thedoFuture
adapter, but does work withdoParallel
anddoSNOW
.I have no clue how to fix it, but in the code of
doFuture
andglobals
I can see.noexport
is taken as an argument but never used elsewhere.