Closed matthiasgomolka closed 6 years ago
My immediate guess is that the eval(parse( = ...))
code makes it very hard for the automatic identification of globals to work, but it's not unlikely that there's a simple fix/workaround. Could you please provide me with a minimal toy example where I can reproduce the above.
In this case it has nothing to do with the usage of eval(parse(...))
. A minimal example is:
## Create a list of two fst_table objects - adopted from example("fst")
library(fst)
path <- paste0(tempfile(), ".fst")
write_fst(iris, path)
ft <- fst(path)
fts <- list(ft, ft)
foo <- function(x) {
keep <- eval(parse(text = "x$Sepal.Length < 5"))
x[keep, ]
}
# Works
y0 <- lapply(fts, FUN = foo)
# Fails
y1 <- future.apply::future_lapply(fts, FUN = foo)
### Error in .subset2(x, i, exact = exact) : subscript out of bounds
with traceback:
> traceback()
13: (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x,
i, exact = exact))(x, ..., exact = exact)
12: `[[.data.frame`(res$resTable, 1)
11: res$resTable[[1]]
10: data.table::setattr(res$resTable, "row.names", 1:length(res$resTable[[1]]))
9: read_fst(meta_info$path, j, old_format = .subset2(x, "old_format"))
8: `[.fst_table`(expr, keep)
7: expr[keep]
6: FUN(X[[i]], ...)
5: lapply(expr, FUN = findGlobals, envir = envir, ..., tweak = tweak,
dotdotdot = dotdotdot, substitute = FALSE, unlist = FALSE)
4: findGlobals(expr, envir = envir, ..., method = method, tweak = tweak,
substitute = FALSE, unlist = unlist)
3: globalsOf(expr, envir = envir, substitute = FALSE, tweak = tweak,
dotdotdot = "return", method = globals.method, unlist = TRUE,
mustExist = mustExist, recursive = TRUE)
2: getGlobalsAndPackages(X_ii, envir = envir, globals = TRUE)
1: future.apply::future_lapply(fts, FUN = foo)
This turns out to be a bug in the globals package (https://github.com/HenrikBengtsson/globals/issues/44). The future framework uses the globals package to identify global variables and packages that need to be exported and the latter currently chokes on fst::fst_table objects.
There's really nothing to fix in the future.apply package, but I'll keep this issue open until fixed in globals and verified that the above code snippet works here.
This has now been fixed in develop globals 0.12.1-9000. To install, use:
remotes::install_github("HenrikBengtsson/globals@develop")
Now, we get:
## Create a list of two fst_table objects - adopted from example("fst")
library(fst)
path <- paste0(tempfile(), ".fst")
write_fst(iris, path)
ft <- fst(path)
fts <- list(ft, ft)
foo <- function(x) {
keep <- eval(parse(text = "x$Sepal.Length < 5"))
x[keep, ]
}
# Works
y0 <- lapply(fts, FUN = foo)
# Fails
library(future.apply)
plan(multisession, workers = 2)
y1 <- future_lapply(fts, FUN = foo)
### Error in x[keep, ] : incorrect number of dimensions
This is a completely different error and indeed expected. What happens is that the future framework fails to identify that the 'fst' package needs to be loaded on the worker. This type of error is discussed in section 'Missing packages (false negatives)' of vignette 'A Future for R: Common Issues with Solutions'.
To workaround around this, we need to use:
y1 <- future_lapply(fts, FUN = foo, future.packages = "data.frame")
and we indeed have that:
stopifnot(identical(y1, y0))
FYI, globals 0.12.2 that fixes this problem is rolling out on CRAN right now.
I have ~ 3000 fst files. These are organized as fst objects in the list
fst_objs
. I want to subset all of these objects using the following function:Using
lapply(fst_objs, filter_select, filter, selection)
wherefilter = 'fst_obj$INSTRUMENT == "DE0009652669"'
andselection = 1:20
works fine and returns a list of small data.frames.Replacing
lapply()
byfuture_lapply()
returnsError in .subset2(x, i, exact = exact) : subscript out of bounds
which is an error fromfst()
.I suspect this is related to the
future
package since a similar problem occurs withmap()
andfuture_map()
from thefurrr
package. Parallel execution works withforeach
.