Closed kforner closed 1 year ago
I don't understand the root issues, but I do remember that I've had issues with 'multisession' and devtools::load_all()
in the past, and I think it's due to the way that devtools::load_all()
emulates attaching an installed package (that is not really installed). Thus, I can't comment on the underlying behavior, and you mention you already have a workaround, but here is a different workaround in case it helps.
It relies on devtools::dev_mode()
and remotes::install_local()
. I had to make one change to the body of your code which is to change your call to MyR6
to be r6bug:::MyR6
, since there is no argument to install_local()
like export_all
.
Below is the full example, and all plans give the same output for me.
library(usethis)
create_package('r6bug') # N.B: change the wd
writeLines('MyR6 <- R6::R6Class("MyR6")', 'R/r6.R')
PKG_PATH <- normalizePath('.')
#devtools::load_all(PKG_PATH, export_all = TRUE)
temp_d <- tempfile()
dir.create(temp_d)
devtools::dev_mode(on = TRUE, path = temp_d)
remotes::install_local(PKG_PATH, upgrade = "never")
library("r6bug")
fun <- function(x) {
devtools::load_all(PKG_PATH, export_all = TRUE)
ns <- parent.env(environment(my_r6$clone))
if (isNamespace(ns)) getNamespaceName(ns) else NA
}
my_r6 <- r6bug:::MyR6$new()
getNamespaceName(parent.env(environment(my_r6$clone)))
library(future)
library(future.apply)
plan(sequential)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
plan(multicore)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
plan(multisession)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
Hi @scottkosty , and thanks for your quick reply. Your work-around is not really applicable for me. Here it is a reprex, but as you can imagine my real problem is a bit more complex, and involves a dozen of source packages. It will take more time to install them than to actually do the computations in sequential mode.
@kforner Makes sense!
Hi. There's no need to involved future.apply for a minimal example. Here's the same with just future:
opwd <- getwd()
usethis::create_package("r6bug") # N.B: change the wd
writeLines("MyR6 <- R6::R6Class('MyR6')", "R/r6.R")
PKG_PATH <- normalizePath(".")
devtools::load_all(PKG_PATH, export_all = TRUE)
fun <- function(x) {
devtools::load_all(PKG_PATH, export_all = TRUE)
parent.env(environment(my_r6$clone))
}
my_r6 <- MyR6$new()
penv <- parent.env(environment(my_r6$clone))
print(penv)
## <environment: namespace:r6bug>
print(getNamespaceName(penv))
## name
## "r6bug"
library(future)
globals <- list(fun = fun, my_r6 = my_r6, PKG_PATH = PKG_PATH)
packages <- "devtools"
plan(sequential)
f <- future(fun(1), globals = globals, packages = packages)
v <- value(f)
print(v)
## <environment: namespace:r6bug>
plan(multisession, workers = I(1))
f <- future(fun(1), globals = globals, packages = packages)
v <- value(f)
print(v)
## <environment: R_GlobalEnv>
Before anything else, are you using devtools::load_all()
here to create a reproducible example, or do you actually use that in "production"? Have you checked the behavior when MyR6
lives in a proper R package?
FWIW, reconstructing environment hierarchies on parallel R processes are really hard, especially if one wants to cover all cases. It basically requires emulating how the R run-time does it. So, this might not be an issue if you use a proper R package that is available to all parallel R processes.
PS. Next, I'll migrate this issue over to the future issue tracker.
Hi Henrik
Yes, I use (indirectly) devtools::load_all()
in production. It's not just for the reprex.
I understand that. The thing is, I can not pass the related packages using future.packages=
since there are not loaded in a "standard" way. If it were possible somehow to "customize" the loading of future.packages=
to sub-processes prior to sending/serializing the globals, I suppose it would work.
I can work-around that issue. I posted it in case you can see a way to generally improve the code.
Thanks for all your work. Much appreciated.
I see. No, I think you need to reconstruct the environment hierarchy yourself. The "namespace" environment created by devtools::load_all(PKG_PATH, export_all = TRUE)
in the main R session is different from that in the parallel worker. The only thing that would hint that they are related is the name attribute, but it's basically only you as a developer that know they're the same. In contrast, with proper R packages, we make assumptions that a package namespace in the main R session is the same as the package namespace in a parallel worker if they share the same package name.
It would require almost mind-reading skills to know that the parent environment of object my_r6
that was created in the main R session, should be bound to a similar "namespace" environment on the parallel worker.
Regarding:
I use (indirectly)
devtools::load_all()
in production.
My only remaining recommendation is to take a deep dive into that and see you if instead could at least build a proper package on-the-fly. R doesn't have in-memory package building/namespace creation and devtools::load_all()
is really a hack that I doubt can fully emulate the real scenario.
Thank you. I'll think about it.
when a R6 instance, from a R6 class defined in a source package (i.e. loaded using
devtools::load_all()
) is passed usingfuture.globals
, the instance is modified in a way where the parent environment of methods is no longer the namespace they are defined in.Reproducible example
Expected behavior
parent.env(environment(my_r6$clone))
in multisession should be the "r6bug" namespacework-around
Currently, I backup the prior correct methods namespace, and attach it as an attribute, then restore it in the future_lappy
FUN
function.Session information