I'm experiencing an issue with the load distribution of parallel jobs executed with future_map. In particular, I observe that the first time I call future_map the workload is happening on 2-3 workers only, while on consecutive runs of the same call the workload is shared evenly across the workers.
I tried to narrow it down into a reprex:
library(future)
library(tictoc)
plan(multisession, workers = 10)
tic()
res <- purrr::map(
.x = 1:1e6, .f = ~.x +1
)
toc()
# 1.92 sec not in paralle
tic()
furrr::future_map(
.x = 1:1e6, .f = ~.x +1
)
toc()
# 3.462 sec on first run
microbenchmark::microbenchmark(
{
furrr::future_map(
.x = 1:1e6, .f = ~.x +1
)
},
times = 20
)
# 1.2 secs on average on consecutive runs
In my "real-world" applications, where there is also a considerable amount of data to be passed to the workers, this tends to be more extreme.
Hello,
I'm experiencing an issue with the load distribution of parallel jobs executed with future_map. In particular, I observe that the first time I call future_map the workload is happening on 2-3 workers only, while on consecutive runs of the same call the workload is shared evenly across the workers. I tried to narrow it down into a reprex:
In my "real-world" applications, where there is also a considerable amount of data to be passed to the workers, this tends to be more extreme.
This might be related to this previous issue: https://github.com/DavisVaughan/furrr/issues/3
I'm working on the following system:
R version 4.2.2 Patched (2022-11-10 r83330) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.1 LTS
Any thoughts appreciated.
Best, Maximilian