HenrikBengtsson / future.apply

:rocket: R package: future.apply - Apply Function to Elements in Parallel using Futures
https://future.apply.futureverse.org
208 stars 16 forks source link

Nested future_lapply function results in an error for using too many cores #121

Open mcoghill opened 4 months ago

mcoghill commented 4 months ago

First of all, thank you for the amazing package - it has been incredibly useful in so many of my workflows and because it follows similar principles to regular lapply() loops it was very easy to implement. I really appreciate the work that has been poured into this. Second, I am placing this bug report here within the future.apply package; however, there are a few moving parts here and I'm not sure if this would be better suited elsewhere, so please let me know if I should move this to another package. On to the details:

Describe the bug I am trying to nest a future_lapply() loop inside another future_lapply() loop, however when I try to assign more than three sessions to the inner loop I am receiving an error message that states:

Error in checkNumberOfLocalWorkers(workers) : 
  Attempting to set up 4 localhost parallel workers with only 1 CPU cores available for this R process (per 'mc.cores'), which could result in a 400% load. The hard limit is set to 300%. Overusing the CPUs has negative impact on the current R process, but also on all other processes of yours and others running on the same machine. See help("parallelly.options", package = "parallelly") for how to override the soft and hard limits

I thought that I had my code set up correctly according to some examples that I had found from here as well as in the "Future topologies" vignette. Indeed, if I use three or fewer parallel workers within the inner loop then the error is downgraded to a warning; however, in this case I can't make use of all of my PC's cores. Below is an example of the code I used to generate the error message above:

Reproduce example

library(future.apply)

# Examples from the future_lapply help
plan(sequential)
y1 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
str(y1)

plan(multisession)
y2 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
str(y2)

# Nested multisession, ends in the error seen above
plan(list(
  tweak(multisession, workers = availableCores() %/% 4),
  tweak(multisession, workers = 4)
))
y3 <- future_lapply(1:5, function(f1) {
  z1 <- rnorm(f1)
  z2 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
  return(list(z1 = z1, z2 = z2))
}, future.seed = 0xBEEF)

Digging into the error message, I was able to override the limits by setting the environment options, but I had to do so within the first future_lapply() loop, i.e.:

y3 <- future_lapply(1:5, function(f1) {
  future:::setOption("parallelly.maxWorkers.localhost", c(+Inf, +Inf))
  z1 <- rnorm(f1)
  z2 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
  return(list(z1 = z1, z2 = z2))
}, future.seed = 0xBEEF)

The function then ran without error. I am able to use this workaround for some operations, however I tend to make use of nested resampling within the mlr3 packages following their examples, and with that I am unable to modify the inner resampling loop.

library(future)
library(mlr3)
library(mlr3tuning)
plan(list(
  tweak(multisession, workers = availableCores() %/% 4),
  tweak(multisession, workers = 4)
))

lrn_rpart = lrn("classif.rpart",
                minsplit  = to_tune(2, 128))

lrn_rpart_tuned = auto_tuner(tnr("random_search", batch_size = 2),
                             lrn_rpart, rsmp("holdout"), msr("classif.ce"), 2)

rr = resample(tsk("penguins"), lrn_rpart_tuned, rsmp("cv", folds = 5))

I can "force" this to work by first running a dummy loop in order to open up the sessions with their environment options set; however, it just seems odd that this is what needs to happen if I want to parallelize both inner and outer resampling loops within mlr3 functions.

# Dummy loop to open the sessions
future_lapply(1:nbrOfWorkers(), function(...) {
  future:::setOption("parallelly.maxWorkers.localhost", c(+Inf, +Inf))
})

# Resampling in parallel now works
rr = resample(tsk("penguins"), lrn_rpart_tuned, rsmp("cv", folds = 5))

Expected behavior I would have expected that when I created the tweaked multisession plan that it would recognize that there are available cores in order to complete the operation instead of resulting in an error. I feel like having to run the dummy loop beforehand is a workaround and not really what was intended with the future_lapply() function.

Session information I have tested this on two PC's: one laptop running Windows 11 using an 8 thread Intel Core i7 processor, and on a desktop running Windows 11 using a 32 thread AMD Ryzen 9 processor with the same results:

> sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.utf8    

time zone: America/Vancouver
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mlr3tuning_0.20.0   paradox_0.11.1      mlr3_0.18.0         future.apply_1.11.1 future_1.33.1      

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5          crayon_1.5.2         cli_3.6.2            rlang_1.1.3          data.table_1.15.2    glue_1.7.0           listenv_0.9.1       
 [8] backports_1.4.1      mlr3measures_0.5.0   mlr3misc_0.14.0      fansi_1.0.6          tibble_3.2.1         palmerpenguins_0.1.1 lifecycle_1.0.4     
[15] RhpcBLASctl_0.23-42  compiler_4.3.3       codetools_0.2-19     pkgconfig_2.0.3      bbotk_0.8.0          rstudioapi_0.15.0    digest_0.6.35       
[22] R6_2.5.1             utf8_1.2.4           pillar_1.9.0         parallelly_1.37.1    parallel_4.3.3       magrittr_2.0.3       rpart_4.1.23        
[29] checkmate_2.3.1      uuid_1.2-0           tools_4.3.3          globals_0.16.3       lgr_0.4.4           

Thank you again for all of your hard work, and please let me know if I should direct this issue elsewhere.

scottkosty commented 4 months ago

I don't understand the issue but in case another data point helps, I ran your reproducible example and didn't get an error. Here is my output:

> library(future.apply)

# Examples from the future_lapply help
plan(sequential)
y1 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
str(y1)

plan(multisession)
y2 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
str(y2)

# Nested multisession, ends in the error seen above
plan(list(
  tweak(multisession, workers = availableCores() %/% 4),
  tweak(multisession, workers = 4)
))
y3 <- future_lapply(1:5, function(f1) {
  z1 <- rnorm(f1)
  z2 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
  return(list(z1 = z1, z2 = z2))
}, future.seed = 0xBEEF)
Loading required package: future
List of 5
 $ : num 0.682
 $ : num [1:2] 1.19 -0.474
 $ : num [1:3] -1.081 -2.412 -0.466
 $ : num [1:4] 1.143 -1.278 0.438 0.819
 $ : num [1:5] -1.2935 -0.0878 -0.3909 0.5652 0.2253
List of 5
 $ : num 0.682
 $ : num [1:2] 1.19 -0.474
 $ : num [1:3] -1.081 -2.412 -0.466
 $ : num [1:4] 1.143 -1.278 0.438 0.819
 $ : num [1:5] -1.2935 -0.0878 -0.3909 0.5652 0.2253
> 

Here is my session info:

> sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 23.04

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future.apply_1.11.0 future_1.33.0       colorout_1.2-2     

loaded via a namespace (and not attached):
 [1] compiler_4.2.2    parallelly_1.36.0 cli_3.6.1         tools_4.2.2       parallel_4.2.2    listenv_0.9.0     codetools_0.2-19  git2r_0.32.0      digest_0.6.33     globals_0.16.2   
[11] rlang_1.1.1      
> 
HenrikBengtsson commented 4 months ago

Thanks both. This was introduced in parallelly 1.37.0 (https://cran.r-project.org/web/packages/parallelly/news/news.html). That part was intentional, but I missed this particular use case. It also slipped through all the unit tests in various packages, because they stay below the CPU-core limit where this is caught.

I'll look into what the best way is to fix this; it'll most likely be an update to the future package. In the meanwhile, one workaround (please see next comment for a better one) is to set R options:

options(parallelly.maxWorkers.localhost = c(Inf, Inf))

which can also be set automatically when the parallelly package is loaded via environment variable:

R_PARALLELLY_MAXWORKERS_LOCALHOST=Inf,Inf
HenrikBengtsson commented 4 months ago

Workaround

Actually, a better workaround is to specify the number of parallel workers using I(), e.g.


# Nested multisession, ends in the error seen above
plan(list(
  tweak(multisession, workers = availableCores() %/% 4),
  tweak(multisession, workers = I(4))    # <= force 4 workers
))

Note that it's not needed at the first layer.

mcoghill commented 4 months ago

Thank you very much @HenrikBengtsson, I was unaware of the I() function - perhaps I should brush up on the documentation a little bit more :wink: I will incorporate this into my workflows, it is a much more elegant solution than what I was trying. Thanks for handling this.