HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
950 stars 83 forks source link

No need to check size of globals with multicore (fork process) #387

Open renkun-ken opened 4 years ago

renkun-ken commented 4 years ago

When using fork process for parallel computing like parallel::mclapply, I'm not sure if there's a necessity to check if the object size of globals is too big since fork's copy-on-write mechanism is exactly designed for this.

library(future)

x <- rnorm(100000000)

plan(multicore, workers = 4, earlySignal = TRUE)
system.time(res <- future.apply::future_lapply(1:50, function(i) {
  sum(x) * i
}))
Error in getGlobalsAndPackages(expr, envir = envir, globals = globals) : 
  The total size of the 2 globals that need to be exported for the future expression ('FUN()') is 762.94 MiB. This exceeds the maximum allowed size of 500.00 MiB (option 'future.globals.maxSize'). There are two globals: 'x' (762.94 MiB of class 'numeric') and 'FUN' (4.72 KiB of class 'function').
Backtrace:
1: stop(msg)
2: getGlobalsAndPackages(expr, envir = envir, globals = globals)
3: getGlobalsAndPackagesXApply(FUN = FUN, args = args, MoreArgs = MoreArgs, 
4: future_xapply(FUN = FUN, nX = nX, chunk_args = X, args = list(...), 
5: future.apply::future_lapply(1:50, function(i) {
6: system.time(res <- future.apply::future_lapply(1:50, function(i) {
Timing stopped at: 0.031 0 0.031

In this case, I have to explicitly disable detection of globals to make it work:

system.time(res <- future.apply::future_lapply(1:50, function(i) {
  sum(x) * i
}, future.globals = FALSE))

BTW, I also notice that future.apply::future_lapply is significantly slower than parallel::mclapply in this simple case with the same number of workers:

system.time(res <- parallel::mclapply(1:50, function(i) {
  sum(x) * i
}, mc.cores = 4))

timing shows:

> plan(multicore, workers = 4, earlySignal = TRUE)                                                                                                                                                                              

> system.time(res <- future.apply::future_lapply(1:50, function(i) { 
    sum(x) * i 
  }, future.globals = FALSE))                                                                                                                                                                                                   
   user  system elapsed 
  4.436   0.119   3.949 

> system.time(res <- parallel::mclapply(1:50, function(i) { 
    sum(x) * i 
  }, mc.cores = 4))                                                                                                                                                                                                             
   user  system elapsed 
  3.937   0.082   2.081 

Not sure if I'm missing something?

HenrikBengtsson commented 4 years ago

I think this is basically the same as Issue #197