Open mcoghill opened 4 months ago
I don't understand the issue but in case another data point helps, I ran your reproducible example and didn't get an error. Here is my output:
> library(future.apply)
# Examples from the future_lapply help
plan(sequential)
y1 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
str(y1)
plan(multisession)
y2 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
str(y2)
# Nested multisession, ends in the error seen above
plan(list(
tweak(multisession, workers = availableCores() %/% 4),
tweak(multisession, workers = 4)
))
y3 <- future_lapply(1:5, function(f1) {
z1 <- rnorm(f1)
z2 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF)
return(list(z1 = z1, z2 = z2))
}, future.seed = 0xBEEF)
Loading required package: future
List of 5
$ : num 0.682
$ : num [1:2] 1.19 -0.474
$ : num [1:3] -1.081 -2.412 -0.466
$ : num [1:4] 1.143 -1.278 0.438 0.819
$ : num [1:5] -1.2935 -0.0878 -0.3909 0.5652 0.2253
List of 5
$ : num 0.682
$ : num [1:2] 1.19 -0.474
$ : num [1:3] -1.081 -2.412 -0.466
$ : num [1:4] 1.143 -1.278 0.438 0.819
$ : num [1:5] -1.2935 -0.0878 -0.3909 0.5652 0.2253
>
Here is my session info:
> sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 23.04
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
Random number generation:
RNG: L'Ecuyer-CMRG
Normal: Inversion
Sample: Rejection
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] future.apply_1.11.0 future_1.33.0 colorout_1.2-2
loaded via a namespace (and not attached):
[1] compiler_4.2.2 parallelly_1.36.0 cli_3.6.1 tools_4.2.2 parallel_4.2.2 listenv_0.9.0 codetools_0.2-19 git2r_0.32.0 digest_0.6.33 globals_0.16.2
[11] rlang_1.1.1
>
Thanks both. This was introduced in parallelly 1.37.0 (https://cran.r-project.org/web/packages/parallelly/news/news.html). That part was intentional, but I missed this particular use case. It also slipped through all the unit tests in various packages, because they stay below the CPU-core limit where this is caught.
I'll look into what the best way is to fix this; it'll most likely be an update to the future package. In the meanwhile, one workaround (please see next comment for a better one) is to set R options:
options(parallelly.maxWorkers.localhost = c(Inf, Inf))
which can also be set automatically when the parallelly package is loaded via environment variable:
R_PARALLELLY_MAXWORKERS_LOCALHOST=Inf,Inf
Actually, a better workaround is to specify the number of parallel workers using I()
, e.g.
# Nested multisession, ends in the error seen above
plan(list(
tweak(multisession, workers = availableCores() %/% 4),
tweak(multisession, workers = I(4)) # <= force 4 workers
))
Note that it's not needed at the first layer.
Thank you very much @HenrikBengtsson, I was unaware of the I()
function - perhaps I should brush up on the documentation a little bit more :wink: I will incorporate this into my workflows, it is a much more elegant solution than what I was trying. Thanks for handling this.
First of all, thank you for the amazing package - it has been incredibly useful in so many of my workflows and because it follows similar principles to regular
lapply()
loops it was very easy to implement. I really appreciate the work that has been poured into this. Second, I am placing this bug report here within thefuture.apply
package; however, there are a few moving parts here and I'm not sure if this would be better suited elsewhere, so please let me know if I should move this to another package. On to the details:Describe the bug I am trying to nest a
future_lapply()
loop inside anotherfuture_lapply()
loop, however when I try to assign more than three sessions to the inner loop I am receiving an error message that states:I thought that I had my code set up correctly according to some examples that I had found from here as well as in the "Future topologies" vignette. Indeed, if I use three or fewer parallel workers within the inner loop then the error is downgraded to a warning; however, in this case I can't make use of all of my PC's cores. Below is an example of the code I used to generate the error message above:
Reproduce example
Digging into the error message, I was able to override the limits by setting the environment options, but I had to do so within the first
future_lapply()
loop, i.e.:The function then ran without error. I am able to use this workaround for some operations, however I tend to make use of nested resampling within the
mlr3
packages following their examples, and with that I am unable to modify the inner resampling loop.I can "force" this to work by first running a dummy loop in order to open up the sessions with their environment options set; however, it just seems odd that this is what needs to happen if I want to parallelize both inner and outer resampling loops within
mlr3
functions.Expected behavior I would have expected that when I created the tweaked multisession plan that it would recognize that there are available cores in order to complete the operation instead of resulting in an error. I feel like having to run the dummy loop beforehand is a workaround and not really what was intended with the
future_lapply()
function.Session information I have tested this on two PC's: one laptop running Windows 11 using an 8 thread Intel Core i7 processor, and on a desktop running Windows 11 using a 32 thread AMD Ryzen 9 processor with the same results:
Thank you again for all of your hard work, and please let me know if I should direct this issue elsewhere.