futureverse / future.apply

:rocket: R package: future.apply - Apply Function to Elements in Parallel using Futures
https://future.apply.futureverse.org
211 stars 16 forks source link

A list of data.table - result of future_lapply - lost side effect (reference semantic) #28

Closed ThoDuyNguyen closed 5 years ago

ThoDuyNguyen commented 6 years ago

I want to apply a function to a list of data.table for side effect. My example code works perfectly

add_new_column <-
  function(df, init_value) {
    df[, new_column = init_value]
  }

l_df <-
  list(as.data.table(iris),
       as.data.table(iris),
       as.data.table(iris))

mapply(
  add_new_column,
  df = l_df,
  init_value = 1:3,
  SIMPLIFY = FALSE
)

And the result is correct:

l_df
[[1]]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species new_column
  1:          5.1         3.5          1.4         0.2    setosa          1
  2:          4.9         3.0          1.4         0.2    setosa          1
  3:          4.7         3.2          1.3         0.2    setosa          1
  4:          4.6         3.1          1.5         0.2    setosa          1
  5:          5.0         3.6          1.4         0.2    setosa          1
 ---                                                                       
146:          6.7         3.0          5.2         2.3 virginica          1
147:          6.3         2.5          5.0         1.9 virginica          1
148:          6.5         3.0          5.2         2.0 virginica          1
149:          6.2         3.4          5.4         2.3 virginica          1
150:          5.9         3.0          5.1         1.8 virginica          1

[[2]]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species new_column
  1:          5.1         3.5          1.4         0.2    setosa          2
  2:          4.9         3.0          1.4         0.2    setosa          2
  3:          4.7         3.2          1.3         0.2    setosa          2
  4:          4.6         3.1          1.5         0.2    setosa          2
  5:          5.0         3.6          1.4         0.2    setosa          2
 ---                                                                       
146:          6.7         3.0          5.2         2.3 virginica          2
147:          6.3         2.5          5.0         1.9 virginica          2
148:          6.5         3.0          5.2         2.0 virginica          2
149:          6.2         3.4          5.4         2.3 virginica          2
150:          5.9         3.0          5.1         1.8 virginica          2

[[3]]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species new_column
  1:          5.1         3.5          1.4         0.2    setosa          3
  2:          4.9         3.0          1.4         0.2    setosa          3
  3:          4.7         3.2          1.3         0.2    setosa          3
  4:          4.6         3.1          1.5         0.2    setosa          3
  5:          5.0         3.6          1.4         0.2    setosa          3
 ---                                                                       
146:          6.7         3.0          5.2         2.3 virginica          3
147:          6.3         2.5          5.0         1.9 virginica          3
148:          6.5         3.0          5.2         2.0 virginica          3
149:          6.2         3.4          5.4         2.3 virginica          3
150:          5.9         3.0          5.1         1.8 virginica          3

But if my list is a result of a future_lapply for example:

l_df <- list(as.data.table(iris), as.data.table(iris), as.data.table(iris))
l_df_test <- future_lapply(l_df, function(x){return(x)})
l_df_test 
[[1]]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
146:          6.7         3.0          5.2         2.3 virginica
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica

[[2]]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
146:          6.7         3.0          5.2         2.3 virginica
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica

[[3]]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
146:          6.7         3.0          5.2         2.3 virginica
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica

When I make a mapply to this list, reference semantic is lost:

mapply(
  add_new_column,
  df = l_df_test,
  init_value = 1:3,
  SIMPLIFY = FALSE
)

l_df_test
[[1]]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
146:          6.7         3.0          5.2         2.3 virginica
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica

[[2]]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
146:          6.7         3.0          5.2         2.3 virginica
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica

[[3]]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
146:          6.7         3.0          5.2         2.3 virginica
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica

My session info:

Session info ---------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.0 (2018-04-23)
 system   x86_64, linux-gnu           
 ui       RStudio (1.1.453)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       Asia/Ho_Chi_Minh            
 date     2018-09-18                  

Packages -------------------------------------------------------------------------------------------------------------------
 package      * version date       source        
 assertthat     0.2.0   2017-04-11 CRAN (R 3.5.0)
 backports      1.1.2   2017-12-13 CRAN (R 3.5.0)
 base         * 3.5.0   2018-05-14 local         
 callr          2.0.4   2018-05-15 CRAN (R 3.5.0)
 codetools      0.2-15  2016-10-05 CRAN (R 3.5.0)
 compiler       3.5.0   2018-05-14 local         
 crayon         1.3.4   2017-09-16 CRAN (R 3.5.0)
 data.table   * 1.11.4  2018-05-27 CRAN (R 3.5.0)
 datasets     * 3.5.0   2018-05-14 local         
 devtools       1.13.5  2018-02-18 CRAN (R 3.5.0)
 digest         0.6.15  2018-01-28 CRAN (R 3.5.0)
 evaluate       0.10.1  2017-06-24 CRAN (R 3.5.0)
 fs           * 1.2.3   2018-06-08 CRAN (R 3.5.0)
 future       * 1.9.0   2018-07-23 CRAN (R 3.5.0)
 future.apply * 1.0.0   2018-06-20 CRAN (R 3.5.0)
 globals        0.12.0  2018-06-12 CRAN (R 3.5.0)
 graphics     * 3.5.0   2018-05-14 local         
 grDevices    * 3.5.0   2018-05-14 local         
 htmltools      0.3.6   2017-04-28 CRAN (R 3.5.0)
 knitr          1.20    2018-02-20 CRAN (R 3.5.0)
 listenv        0.7.0   2018-01-21 CRAN (R 3.5.0)
 lubridate    * 1.7.4   2018-04-11 CRAN (R 3.5.0)
 magrittr       1.5     2014-11-22 CRAN (R 3.5.0)
 memoise        1.1.0   2017-04-21 CRAN (R 3.5.0)
 methods      * 3.5.0   2018-05-14 local         
 parallel       3.5.0   2018-05-14 local         
 processx       3.1.0   2018-05-15 CRAN (R 3.5.0)
 R6             2.2.2   2017-06-17 CRAN (R 3.5.0)
 Rcpp           0.12.17 2018-05-18 CRAN (R 3.5.0)
 reprex       * 0.1.2   2018-01-26 CRAN (R 3.5.0)
 rmarkdown      1.10    2018-06-11 CRAN (R 3.5.0)
 rprojroot      1.3-2   2018-01-03 CRAN (R 3.5.0)
 rstudioapi     0.7     2017-09-07 CRAN (R 3.5.0)
 stats        * 3.5.0   2018-05-14 local         
 stringi        1.2.3   2018-06-12 CRAN (R 3.5.0)
 stringr      * 1.3.1   2018-05-10 CRAN (R 3.5.0)
 tools          3.5.0   2018-05-14 local         
 utils        * 3.5.0   2018-05-14 local         
 whisker        0.3-2   2013-04-28 CRAN (R 3.5.0)
 withr          2.1.2   2018-03-15 CRAN (R 3.5.0)
 yaml           2.1.19  2018-05-01 CRAN (R 3.5.0)
HenrikBengtsson commented 6 years ago

Very quick reply: Could this be related to the 'data.table' issue described in https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html. If so, try with:

l_df_test <- future_lapply(l_df, function(x){return(x)}, future.packages = "data.table")
HenrikBengtsson commented 6 years ago

FYI, your very first example gives an error in a fresh R session:

> library(data.table)
data.table 1.11.4  Latest news: http://r-datatable.com
> add_new_column <-
+   function(df, init_value) {
+     df[, new_column = init_value]
+   }
> 
> l_df <-
+   list(as.data.table(iris),
+        as.data.table(iris),
+        as.data.table(iris))
> 
> mapply(
+   add_new_column,
+   df = l_df,
+   init_value = 1:3,
+   SIMPLIFY = FALSE
+ )
Error in `[.data.table`(df, , new_column = init_value) : 
  unused argument (new_column = init_value)

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.4

loaded via a namespace (and not attached):
[1] compiler_3.5.1
ThoDuyNguyen commented 6 years ago

I forgot the bug for a while

My code should be

add_new_column <-
  function(df, init_value) {
    df[, new_column := init_value]
  }

l_df <-
  list(as.data.table(iris),
       as.data.table(iris),
       as.data.table(iris))

mapply(
  add_new_column,
  df = l_df,
  init_value = 1:3,
  SIMPLIFY = FALSE
)
HenrikBengtsson commented 6 years ago

It works for me; what's the problem?

library(future.apply)
plan(multisession, workers = 2)

library(data.table)

add_new_column <-
  function(df, init_value) {
    df[, new_column := init_value]
  }

l_df <-
  list(as.data.table(iris),
       as.data.table(iris),
       as.data.table(iris))

y0 <- mapply(
  add_new_column,
  df = l_df,
  init_value = 1:3,
  SIMPLIFY = FALSE
)

y1 <- future_mapply(
  add_new_column,
  df = l_df,
  init_value = 1:3,
  SIMPLIFY = FALSE
)

print(all.equal(y1, y0))
### [1] TRUE

with

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.4  future.apply_1.0.1 future_1.9.0      

loaded via a namespace (and not attached):
[1] compiler_3.5.1     tools_3.5.1        parallel_3.5.1     listenv_0.7.0-9000
[5] codetools_0.2-15   digest_0.6.17      globals_0.12.3    
HenrikBengtsson commented 6 years ago

Don't mind my listenv_0.7.0-9000 - it works also with listenv_0.7.0

HenrikBengtsson commented 6 years ago

Did you see my follow up? Can we close?

HenrikBengtsson commented 5 years ago

Didn't hear back from you. I'll assume this if resolved, but feel free to reopen if not the case.