Closed thiesben closed 3 years ago
The issue here is the same as https://github.com/HenrikBengtsson/globals/issues/46 and won't be fixed by furrr.
The problem is that the underlying {globals} package that looks for globals and packages to "export" to your workers can't find anything that is specific to data table...until you call setDT()
. It isn't the act of "setting" the object as a data table that fixes things. It's just the fact that that function is there, so now globals sees that data.table is a required package for that function to run.
The easiest way to fix this is to require data table to be loaded on the workers with furrr_options(packages = "data.table")
library(data.table)
library(furrr)
# nothing in here is "data.table specific"
fun1 <- function(x) {
x[,y]
}
fun2 <- function(x) {
# do something stupid that clearly requires data table
data.table(1)
x[,y]
}
df <- data.frame(x = c(1,2), y = c(1.2,3.4))
dt <- setDT(df)
lst <- list(dt)
plan(multisession, workers = 2)
future_map(lst, fun1)
#> Error in `[.data.frame`(x, , y): object 'y' not found
future_map(lst, fun2)
#> [[1]]
#> [1] 1.2 3.4
future_map(lst, fun1, .options = furrr_options(packages = "data.table"))
#> [[1]]
#> [1] 1.2 3.4
I've come across a bug when working with data.tables and furrr. Check out this reprex:
I'm encountering an error saying "Error in '[.data.frame'(two, , a) : object 'a' not found" when trying to access columns in the way done by the function in the example. However, when (redundantly!) calling setDT in the function, it works without problems. I really don't know where to address this, the behaviour is very weird.
Also, this does not only affect indexing with data.tables, but also filtering etc.