Closed renkun-ken closed 1 year ago
However, using future.callr::callr
as the backend is not perfect. Although the following case shows that the .N
is successfully masked in the future,
library(data.table)
library(future)
source("./future-source1.R")
plan(future.callr::callr, workers = 1, gc = TRUE, earlySignal = TRUE)
res <- future.apply::future_lapply(1:2, function(date) {
.N
})
res
[[1]]
[1] 100
[[2]]
[1] 100
The following example still fails:
library(data.table)
library(future)
source("./future-source1.R")
plan(future.callr::callr, workers = 1, gc = TRUE, earlySignal = TRUE)
res <- future.apply::future_lapply(1:5, function(date) {
address(date)
.N
})
res
[[1]]
NULL
[[2]]
NULL
It seems as long as the future contains a function call from the package, in this case, data.table::address()
, the .N
will not be masked. This occurs no matter how many workers are specified.
The long story has to wait until after the holidays, but the short story:
source()
is peculiar in the sense that it's default source(..., local = FALSE)
because it evaluates in the global environments. Evaluating and assigning to the global environment is never a good idea, unless you call source()
from the prompt, when you're basically already in the global environment. So, yes, I'd say, source(..., local = TRUE)
is always better (regardless of parallel processing or not). I would not use local = FALSE
. I'll add this to the to-do to document.
Also, plan(multisession, workers = 1)
falls back to plan(sequential)
, which explains that difference. If you want a single background worker, plan(cluster, workers = 1)
provides that.
For the other things, there might be some funny things going on with identification of globals; I'll look into that later.
I've finally got around to look at this more. The odd behavior is a combination of using the default source(..., local = FALSE)
and the fact that the global environment affects parallel backends and sequential processing differently. The latter is on the roadmap to harmonize (it has to do with the long outstanding feature request for caching globals, sticky globals, and other things; each update moves one step closer to fix that).
So the following is as expected when using sequential processing:
library(data.table)
library(future)
plan(sequential)
cat(file = "./future-source1.R", ".N <- 100")
## Case 1: The future picks up data.table:::.N = NULL as a global
## find(".N") => "package:data.table"
message("Case 1")
rm(list = c("fread", ".N"))
f <- future({
str(.N) # global .N
.N
})
stopifnot(".N" %in% names(f$globals))
v <- value(f)
# NULL
stopifnot(is.null(v))
## Case 2: Similar to Case 1, but we now use source() with the
## default local = FALSE
message("Case 2")
rm(list = c("fread", ".N"))
f <- future({
str(.N) # global .N
source("./future-source1.R")
str(.N) # still global .N, because source(... local = FALSE)
.N
})
stopifnot(".N" %in% names(f$globals))
v <- value(f)
# NULL
# NULL
stopifnot(is.null(v))
## Case 3: Similar to Case 2, but we now use source() with local = TRUE
message("Case 3")
rm(list = c("fread", ".N"))
f <- future({
str(.N) # global .N
source("./future-source1.R", local = TRUE)
str(.N) # local .N, because source(... local = TRUE)
.N
})
stopifnot(".N" %in% names(f$globals))
v <- value(f)
# NULL
# num 100
stopifnot(v == 100)
However, what is odd is that if we take Case 2 and add a dummy fread
(e.g. at the top of the future expression), it causes .N
to be dropped from the future globals, e.g.
## Case 4: Similar to Case 1, but with a dummy global 'fread()'
message("Case 4")
rm(list = c("fread", ".N"))
f <- future({
fread
str(.N) # Weird; .N is not part of the globals, but find(".N")
.N
})
## stopifnot(".N" %in% names(f$globals)) ## ODD
v <- value(f)
# NULL
The recommendation of using source(..., local = TRUE)
is now documented in https://future.futureverse.org/articles/future-4-issues.html#using-source-in-a-future
However, what is odd is that if we take Case 2 and add a dummy
fread
(e.g. at the top of the future expression), it causes.N
to be dropped from the future globals, e.g.## Case 4: Similar to Case 1, but with a dummy global 'fread()' message("Case 4") rm(list = c("fread", ".N")) f <- future({ fread str(.N) # Weird; .N is not part of the globals, but find(".N") .N }) ## stopifnot(".N" %in% names(f$globals)) ## ODD v <- value(f) # NULL
Ah, slow morning and a slow brain here: Of course .N
is not a global variable here, because it's an exported object of data.table and therefore not part of the globals;
library(data.table)
library(future)
f <- future::future({
fread
.N
})
message(sprintf("Globals: [n=%d] %s", length(f$globals), paste(sQuote(names(f$globals)), collapse = ", ")))
message(sprintf("Packages: [n=%d] %s", length(f$packages), paste(sQuote(f$packages), collapse = ", ")))
#> Globals: [n=0]
#> Packages: [n=1] 'data.table'
What's peculiar though is that we get:
library(data.table)
library(future)
f <- future::future({
.N
})
message(sprintf("Globals: [n=%d] %s", length(f$globals), paste(sQuote(names(f$globals)), collapse = ", ")))
message(sprintf("Packages: [n=%d] %s", length(f$packages), paste(sQuote(f$packages), collapse = ", ")))
#> Globals: [n=1] '.N'
#> Packages: [n=0]
This should be no different from the above case. There's something funny going on with future::getGlobalsAndPackages()
in the second case:
> str(future::getGlobalsAndPackages(quote({ fread; .N }), globals = TRUE)[c("globals", "packages")])
List of 2
$ globals : Named list()
..- attr(*, "where")= Named list()
..- attr(*, "class")= chr [1:3] "FutureGlobals" "Globals" "list"
..- attr(*, "resolved")= logi TRUE
..- attr(*, "total_size")= num 0
$ packages: chr "data.table"
> str(future::getGlobalsAndPackages(quote({ .N }), globals = TRUE)[c("globals", "packages")])
List of 2
$ globals :List of 1
..$ .N: NULL
..- attr(*, "where")=List of 1
.. ..$ .N:<environment: R_EmptyEnv>
..- attr(*, "class")= chr [1:3] "FutureGlobals" "Globals" "list"
..- attr(*, "resolved")= logi FALSE
..- attr(*, "total_size")= num 0
$ packages: chr(0)
The latter (https://github.com/HenrikBengtsson/future/issues/575#issuecomment-1229443818) was due to a bug in globals::packagesOf()
, which has been fixed for the next release of globals.
Thanks for reporting on this.
source()
does not seem to work consistently inside the future expression withworkers=1
andworkers>1
for non-function variables in the sourced file. Following is a minimal example:test.R
:Following is
future-source1.R
wherefread
(previously a functiondata.table::fread
) and.N
(previously a symboldata.table::.N
which isNULL
) are redefined.Then running
test.R
produces the following result:which indicates that
source()
does not work as expected since we still getdata.table::.N
instead of.N <- 100
.However, if we replace returning
.N
with afread(...)
call, then it works as expected:If we replace the number of workers with more than 1, then source always works.
One walk around for now is to use
source(..., local = TRUE)
but it would definitely be nice to have this fixed for consistency. Also, usingfuture.callr::callr
as the backend works too.