HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
946 stars 82 forks source link

Global options become unset when using plan(multisession) #609

Closed ismirsehregal closed 2 years ago

ismirsehregal commented 2 years ago

Describe the bug

After updating library(future) to the latest CRAN version (1.25.0) one of my shiny apps (was running fine for a few years already) which is making use of futures is broken.

It seems that global options become unset on the second run of the future (but library(data.table) requires them to be set). I can work around this issue by setting the required options inside the future (see the comment in the below example code), however, the same happens with global variables (or variables generated in the parent.frame respectively) as described here leading to "argument is of length zero" errors in several places.

The NEWS mentions:

R options and environment variables are now reset on the workers after future is resolved so that any changes to them by the future expression have no effect on following futures.

I'm wondering how I can get back the old behaviour. Do I have to set all options and globals manually for each future?

Reproduce example

library(shiny)
library(data.table)
library(promises)
library(future)

plan(multisession)

ui <- fluidPage(
  actionButton("go", "go")
)

server <- function(input, output, session) {
  observeEvent(input$go, {
    testFuture <- future({
      # options(datatable.verbose = FALSE, datatable.alloccol = 1024) # working fine
      data.table(1:5)
    })
    print(value(testFuture))
  })
}

shinyApp(ui, server)

Animation

Expected behavior

Keeping options and globals after a future is resolved

Session information

> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8    LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future_1.25.0     promises_1.2.0.1  data.table_1.14.2 shiny_1.7.1      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3      parallelly_1.31.1 magrittr_2.0.3    xtable_1.8-4      R6_2.5.1          rlang_1.0.2       fastmap_1.1.0     httr_1.4.2       
 [9] globals_0.14.0    tools_4.2.0       parallel_4.2.0    DT_0.22           cli_3.2.0         jquerylib_0.1.4   withr_2.5.0       htmltools_0.5.2  
[17] ellipsis_0.3.2    digest_0.6.29     lifecycle_1.0.1   later_1.3.0       sass_0.4.1        htmlwidgets_1.5.4 codetools_0.2-18  cachem_1.0.6     
[25] mime_0.12         bslib_0.3.1       compiler_4.2.0    jsonlite_1.8.0    httpuv_1.6.5      listenv_0.8.0

> future::futureSessionInfo()
*** Package versions
future 1.25.0, parallelly 1.31.1, parallel 4.2.0, globals 0.14.0, listenv 0.8.0

*** Allocations
availableCores():
system 
     4 
availableWorkers():
$system
[1] "localhost" "localhost" "localhost" "localhost"

*** Settings
- future.plan=<not set>
- future.fork.multithreading.enable=<not set>
- future.globals.maxSize=<not set>
- future.globals.onReference=<not set>
- future.resolve.recursive=<not set>
- future.rng.onMisuse=<not set>
- future.wait.timeout=<not set>
- future.wait.interval=<not set>
- future.wait.alpha=<not set>
- future.startup.script=<not set>

*** Backends
Number of workers: 4
List of future strategies:
1. multisession:
   - args: function (..., workers = availableCores(), lazy = FALSE, rscript_libs = .libPaths(), envir = parent.frame())
   - tweaked: FALSE
   - call: plan(multisession)

*** Basic tests
  worker   pid     r sysname release     version    nodename machine    login     user effective_user
1      1  9144 4.2.0 Windows  10 x64 build 19044 TEST-PC  x86-64 tester tester       tester
2      2 17300 4.2.0 Windows  10 x64 build 19044 TEST-PC  x86-64 tester tester       tester
3      3    60 4.2.0 Windows  10 x64 build 19044 TEST-PC  x86-64 tester tester       tester
4      4  4364 4.2.0 Windows  10 x64 build 19044 TEST-PC  x86-64 tester tester       tester
Number of unique PIDs: 4 (as expected)
HenrikBengtsson commented 2 years ago

Thanks for the report and sorry to hear that your tool broke all of a sudden.

So, the update in future 1.25.0 to undo any changes to R options and environment variables done by a future expression is very much intentional. It was done to minimize the risk for a future expression to have side effects that affect R elsewhere - I'm basically pushing for futures to be evaluated as if they were evaluated in a sandboxed environment. Now, despite running tons of checks that this did not break anything, I missed this behavior of data.table.

A minimal reproducible example without shiny is:

library(data.table)
#> data.table 1.14.2 using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com
library(future)
plan(multisession, workers = 2)

f <- future(data.table())
v <- value(f)
print(v)
#> Null data.table (0 rows and 0 cols)

f <- future(data.table())
v <- value(f)
#> Error in setalloccol(ans) : verbose must be TRUE or FALSE

This is a serious regression that needs to be fixed asap. This might be a design bug in data.table, it might be a bug in future, or I might have to reconsider the undoing of R options in future - I'll investigate.

HenrikBengtsson commented 2 years ago

Workaround

A workaround, until this has been resolved, is to run the following immediately after setting the future plan;

## May 'datatable.*' R options created by the data.table package stick
dummy <- value(lapply(seq_len(nbrOfWorkers()), FUN = function(ii) {
  future(NULL, packages = "data.table", earlySignal = TRUE)
}))

FWIW, I've reported the underlying problem to https://github.com/Rdatatable/data.table/issues/5375, because I argue data.table should handle this. Having said that, I also think I can avoid this in future, so I'll fix/workaround it there too. This problem is unique to cluster/multisession futures.

HenrikBengtsson commented 2 years ago

I've fixed this in future (>= 1.25.0-9005). Please install the develop version and retry. See https://future.futureverse.org/#pre-release-version for how to install it.

PS. There are corner cases where it still can break, e.g. when someone calls future(library(data.table)) before actually using any data.table functions.

ismirsehregal commented 2 years ago

@HenrikBengtsson thank you for your swift investigation! I can confirm that the issue is gone after installing the dev version of library(future). From my side this issue can be closed - Cheers!

HenrikBengtsson commented 2 years ago

Thanks for confirming.

HenrikBengtsson commented 2 years ago

FYI, future 1.26.1 fixing this problem just hit CRAN.

leungi commented 2 years ago

FYI, future 1.26.1 fixing this problem just hit CRAN.

Just bumped into this issue today with version 1.25.0, and glad that there's a fix in place.

Appreciate your work @HenrikBengtsson 🙏