futureverse / future.apply

:rocket: R package: future.apply - Apply Function to Elements in Parallel using Futures
https://future.apply.futureverse.org
211 stars 16 forks source link

multisession does not work with dots but multicore does #61

Closed tdhock closed 3 years ago

tdhock commented 4 years ago

hi @HenrikBengtsson hope all is well. Just noticed an inconsistency between the results of multicore and multisession when using future_lapply on a function that refers to dots ... in its code/body. I expected that multicore and multisession should always give the same results (result should be independent of plan) and I did not see this in the documentation nor in the issues, so I'm filing a new one. Here is a MRE

compute <- function(a, x_vec){
  a+x_vec
}
compute_some_buggy <- function(..., x_vec=1:2){
  compute_one_buggy <- function(x){
    compute(..., x_vec=x)
  }
  future.apply::future_sapply(x_vec, compute_one_buggy)
}
future::plan("multicore")
compute_some_buggy(0)

sessionInfo()

future::plan("multisession")
compute_some_buggy(0)

The output that I get on my system is

(base) tdhock@maude-MacBookPro:~/R$ R --vanilla < future_some_buggy.R

R version 4.0.0 (2020-04-24) -- "Arbor Day"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> compute <- function(a, x_vec){
+   a+x_vec
+ }
> compute_some_buggy <- function(..., x_vec=1:2){
+   compute_one_buggy <- function(x){
+     compute(..., x_vec=x)
+   }
+   future.apply::future_sapply(x_vec, compute_one_buggy)
+ }
> future::plan("multicore")
> compute_some_buggy(0)
[1] 1 2
> 
> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /home/tdhock/lib/R/lib/libRblas.so
LAPACK: /home/tdhock/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.0.0     tools_4.0.0        parallel_4.0.0     future.apply_1.5.0
[5] listenv_0.8.0      codetools_0.2-16   digest_0.6.25      globals_0.12.5    
[9] future_1.17.0     
> 
> future::plan("multisession")
> compute_some_buggy(0)
Error in ...future.FUN(...future.X_jj, ...) : unused argument (0)
Calls: compute_some_buggy ... resolve.list -> signalConditionsASAP -> signalConditions
Execution halted
(base) tdhock@maude-MacBookPro:~/R$ 

I expected the second call to compute_some_buggy to return the same thing as the first. (1 2)

This seems to be a problem with finding the right place to evaluate the dots, because actually this works:

compute <- function(a, x_vec){
  a+x_vec
}
compute_some_works <- function(..., x_vec=1:2){
  arg.list <- list(...)
  compute_one_works <- function(x){
    arg.list[["x_vec"]] <- x
    do.call(compute, arg.list)
  }
  future.apply::future_sapply(x_vec, compute_one_works)
}
future::plan("multicore")
compute_some_works(0)
future::plan("multisession")
compute_some_works(0)
HenrikBengtsson commented 4 years ago

Thanks, I can reproduce. This somewhat seems familiar to me ... from many years back. I tried to reproduce this using plain futures without future.apply;

library(future)

compute <- function(a, x_vec) {
  a + x_vec
}

compute_some_buggy <- function(..., xvec = 1:2) {
  compute_one_buggy <- function(x) {
    compute(..., x_vec = x)
  }
  value(future(compute_one_buggy(xvec[1])))
}

plan(cluster, workers = 1L)
x <- compute_some_buggy(2)
print(x)

but this works just fine, which makes me think it's a problem with how the future.apply package handles .... I'll investigate.

HenrikBengtsson commented 4 years ago

Oh, forgot to say, the "clean" approach to solve this, is to not rely on global objects (as the inner ... arguments are in your example), and instead pass them along as arguments. The following works:

compute_some_buggy <- function(..., x_vec = 1:2) {
  compute_one_buggy <- function(x, ...) {
    compute(..., x_vec = x)
  }
  future.apply::future_sapply(x_vec, FUN = compute_one_buggy, ...)
}

BTW, I think we tend to be more forgiving about leaving ... implicit than named arguments - it's not uncommon to see your approach. However, it's always safer to make sure every function doesn't operate on globals - even if they function is a local function.

tdhock commented 4 years ago

Hi thanks for the feedback and suggestion about passing the dots through / avoiding references to globals.

HenrikBengtsson commented 3 years ago

This has been fixed in the develop branch, where I also added a package test based on your example, cf. commit ebda4df.

tdhock commented 3 years ago

great thanks

HenrikBengtsson commented 3 years ago

... and future.apply 1.8.1 is on CRAN now