HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
951 stars 83 forks source link

global r6 instances are modified and unusable in multisession #691

Closed kforner closed 1 year ago

kforner commented 1 year ago

when a R6 instance, from a R6 class defined in a source package (i.e. loaded using devtools::load_all()) is passed using future.globals, the instance is modified in a way where the parent environment of methods is no longer the namespace they are defined in.

Reproducible example

library(usethis)
create_package('r6bug') # N.B: change the wd
writeLines('MyR6 <- R6::R6Class("MyR6")', 'R/r6.R')
PKG_PATH <- normalizePath('.')
devtools::load_all(PKG_PATH, export_all = TRUE)

fun <- function(x) {
  devtools::load_all(PKG_PATH, export_all = TRUE)
  ns <- parent.env(environment(my_r6$clone))
  if (isNamespace(ns)) getNamespaceName(ns) else NA
}
my_r6 <- MyR6$new()
getNamespaceName(parent.env(environment(my_r6$clone)))

library(future)
library(future.apply)
plan(sequential)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
# ℹ Loading r6bug
# ℹ Loading r6bug
# [[1]]
#    name 
# "r6bug" 

# [[2]]
#    name 
# "r6bug" 

plan(multicore)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
# ℹ Loading r6bug
# ℹ Loading r6bug
# [[1]]
#    name 
# "r6bug" 

# [[2]]
#    name 
# "r6bug" 

plan(multisession)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
# ℹ Loading r6bug
# ℹ Loading r6bug
# [[1]]
# [1] NA

# [[2]]
# [1] NA

Expected behavior parent.env(environment(my_r6$clone)) in multisession should be the "r6bug" namespace

work-around

Currently, I backup the prior correct methods namespace, and attach it as an attribute, then restore it in the future_lappy FUN function.

Session information

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] r6bug_0.0.0.9000   devtools_2.4.3     future.apply_1.8.1 future_1.23.0     
[5] usethis_2.1.5     

loaded via a namespace (and not attached):
 [1] pillar_1.7.0      compiler_4.1.3    remotes_2.4.2     prettyunits_1.1.1
 [5] tools_4.1.3       digest_0.6.29     testthat_3.1.3    pkgbuild_1.3.1   
 [9] pkgload_1.2.4     jsonlite_1.8.0    memoise_2.0.1     lifecycle_1.0.1  
[13] tibble_3.1.6      pkgconfig_2.0.3   rlang_1.0.2       cli_3.2.0        
[17] rstudioapi_0.13   parallel_4.1.3    xfun_0.30         fastmap_1.1.0    
[21] withr_2.5.0       stringr_1.4.0     roxygen2_7.1.2    xml2_1.3.3       
[25] knitr_1.38        globals_0.14.0    desc_1.4.1        fs_1.5.2         
[29] vctrs_0.4.1       rprojroot_2.0.3   glue_1.6.2        listenv_0.8.0    
[33] R6_2.5.1          processx_3.5.3    parallelly_1.30.0 fansi_1.0.3      
[37] sessioninfo_1.2.2 purrr_0.3.4       callr_3.7.0       magrittr_2.0.3   
[41] whisker_0.4       codetools_0.2-18  ps_1.6.0          ellipsis_0.3.2   
[45] utf8_1.2.2        stringi_1.7.6     cachem_1.0.6      crayon_1.5.1     
[49] brio_1.1.3    
scottkosty commented 1 year ago

I don't understand the root issues, but I do remember that I've had issues with 'multisession' and devtools::load_all() in the past, and I think it's due to the way that devtools::load_all() emulates attaching an installed package (that is not really installed). Thus, I can't comment on the underlying behavior, and you mention you already have a workaround, but here is a different workaround in case it helps.

It relies on devtools::dev_mode() and remotes::install_local(). I had to make one change to the body of your code which is to change your call to MyR6 to be r6bug:::MyR6, since there is no argument to install_local() like export_all.

Below is the full example, and all plans give the same output for me.

library(usethis)
create_package('r6bug') # N.B: change the wd
writeLines('MyR6 <- R6::R6Class("MyR6")', 'R/r6.R')
PKG_PATH <- normalizePath('.')
#devtools::load_all(PKG_PATH, export_all = TRUE)

temp_d <- tempfile()
dir.create(temp_d)
devtools::dev_mode(on = TRUE, path = temp_d)
remotes::install_local(PKG_PATH, upgrade = "never")
library("r6bug")

fun <- function(x) {
  devtools::load_all(PKG_PATH, export_all = TRUE)
  ns <- parent.env(environment(my_r6$clone))
  if (isNamespace(ns)) getNamespaceName(ns) else NA
}
my_r6 <- r6bug:::MyR6$new()
getNamespaceName(parent.env(environment(my_r6$clone)))

library(future)
library(future.apply)
plan(sequential)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')

plan(multicore)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')

plan(multisession)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
kforner commented 1 year ago

Hi @scottkosty , and thanks for your quick reply. Your work-around is not really applicable for me. Here it is a reprex, but as you can imagine my real problem is a bit more complex, and involves a dozen of source packages. It will take more time to install them than to actually do the computations in sequential mode.

scottkosty commented 1 year ago

@kforner Makes sense!

HenrikBengtsson commented 1 year ago

Hi. There's no need to involved future.apply for a minimal example. Here's the same with just future:

opwd <- getwd()
usethis::create_package("r6bug") # N.B: change the wd
writeLines("MyR6 <- R6::R6Class('MyR6')", "R/r6.R")
PKG_PATH <- normalizePath(".")
devtools::load_all(PKG_PATH, export_all = TRUE)

fun <- function(x) {
  devtools::load_all(PKG_PATH, export_all = TRUE)
  parent.env(environment(my_r6$clone))
}

my_r6 <- MyR6$new()
penv <- parent.env(environment(my_r6$clone))
print(penv)
## <environment: namespace:r6bug>
print(getNamespaceName(penv))
##    name 
## "r6bug" 

library(future)
globals <- list(fun = fun, my_r6 = my_r6, PKG_PATH = PKG_PATH)
packages <- "devtools"

plan(sequential)
f <- future(fun(1), globals = globals, packages = packages)
v <- value(f)
print(v)
## <environment: namespace:r6bug>

plan(multisession, workers = I(1))
f <- future(fun(1), globals = globals, packages = packages)
v <- value(f)
print(v)
## <environment: R_GlobalEnv>
  1. Before anything else, are you using devtools::load_all() here to create a reproducible example, or do you actually use that in "production"? Have you checked the behavior when MyR6 lives in a proper R package?

  2. FWIW, reconstructing environment hierarchies on parallel R processes are really hard, especially if one wants to cover all cases. It basically requires emulating how the R run-time does it. So, this might not be an issue if you use a proper R package that is available to all parallel R processes.

PS. Next, I'll migrate this issue over to the future issue tracker.

kforner commented 1 year ago

Hi Henrik

  1. Yes, I use (indirectly) devtools::load_all() in production. It's not just for the reprex.

    • I have not checked in a proper R package, but I suspect it should work. I understand it is a corner case.
  2. I understand that. The thing is, I can not pass the related packages using future.packages= since there are not loaded in a "standard" way. If it were possible somehow to "customize" the loading of future.packages= to sub-processes prior to sending/serializing the globals, I suppose it would work.

  3. I can work-around that issue. I posted it in case you can see a way to generally improve the code.

Thanks for all your work. Much appreciated.

HenrikBengtsson commented 1 year ago

I see. No, I think you need to reconstruct the environment hierarchy yourself. The "namespace" environment created by devtools::load_all(PKG_PATH, export_all = TRUE) in the main R session is different from that in the parallel worker. The only thing that would hint that they are related is the name attribute, but it's basically only you as a developer that know they're the same. In contrast, with proper R packages, we make assumptions that a package namespace in the main R session is the same as the package namespace in a parallel worker if they share the same package name.

It would require almost mind-reading skills to know that the parent environment of object my_r6 that was created in the main R session, should be bound to a similar "namespace" environment on the parallel worker.

Regarding:

I use (indirectly) devtools::load_all() in production.

My only remaining recommendation is to take a deep dive into that and see you if instead could at least build a proper package on-the-fly. R doesn't have in-memory package building/namespace creation and devtools::load_all() is really a hack that I doubt can fully emulate the real scenario.

kforner commented 1 year ago

Thank you. I'll think about it.