DavisVaughan / furrr

Apply Mapping Functions in Parallel using Futures
https://furrr.futureverse.org/
Other
695 stars 39 forks source link

work with a list of data frames #204

Closed EvoLandEco closed 2 years ago

EvoLandEco commented 2 years ago

I use purrr::map() to map each data frame in a list to a function, it seems that future_map() cannot handle this situation. make_chunks() cannot split the list into even parts.

I would like to know that is there any workaround to make use of parallel computing in my particular situation? image

DavisVaughan commented 2 years ago

Can you please provide a full reproducible example that uses map(), but not with future_map()? I am not sure what you are talking about unfortunately.

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page.

You can install reprex by running (you may already have it, though, if you have the tidyverse package installed):

install.packages("reprex")

Thanks

EvoLandEco commented 2 years ago

Thanks for your advice and sorry about the bad example. Here is an example by reprex and I hope it is a proper place to discuss. The background is that I want to run my simulation with various sets of parameters by using purrr::map(). I replaced purrr::map() with furrr::future_map() to try to make use of all my 8 cpu cores, but it seems that the performance is not significantly improved.

What could be the problem here? Could it because of the data structure of testcombo?

devtools::install_github("rsetienne/DDD@tianjian_Rampal")
#> Skipping install of 'DDD' from a github remote, the SHA1 (11629491) has not changed since last install.
#>   Use `force = TRUE` to force installation
devtools::install_github("EvoLandEco/eve")
#> Skipping install of 'eve' from a github remote, the SHA1 (8cc5ea66) has not changed since last install.
#>   Use `force = TRUE` to force installation

testcombo <- eve::edd_combo_maker(
  la = c(0.5, 0.3),
  mu = c(0.1, 0.2),
  beta_n = -0.0001,
  beta_phi = -0.0001,
  gamma_n = 0.0001,
  gamma_phi = 0.0001,
  age = c(3, 5),
  model = "dsde2",
  metric = c("ed", "pd"),
  offset = "none"
)

future_opts <- furrr::furrr_options(seed = TRUE)

testfuna <- function(testcombo, future_opts) {
  future::plan(future::sequential)
  furrr::future_map(
    .x = testcombo,
    .f = eve::edd_wrapper,
    .options = future_opts,
    nrep = 3,
    make_plot = FALSE,
    make_stat = FALSE,
    plot_opt = NULL,
    stat_opt = NULL
  )
}

testfunb <- function(testcombo, future_opts) {
  future::plan(future::multisession, workers = 8)
  furrr::future_map(
    .x = testcombo,
    .f = eve::edd_wrapper,
    .options = future_opts,
    nrep = 3,
    make_plot = FALSE,
    make_stat = FALSE,
    plot_opt = NULL,
    stat_opt = NULL
  )
}

microbenchmark::microbenchmark(testfuna(testcombo, future_opts),
                               testfunb(testcombo, future_opts),
                               times = 5L)
#> Unit: seconds
#>                              expr      min       lq     mean   median       uq
#>  testfuna(testcombo, future_opts) 7.947545 8.496104 9.475211 9.262720 10.11764
#>  testfunb(testcombo, future_opts) 6.920686 7.159854 8.402481 7.214634 10.26396
#>       max neval
#>  11.55204     5
#>  10.45327     5

Created on 2021-09-23 by the reprex package (v2.0.1)

EvoLandEco commented 2 years ago

I further tested larger simulations, it seems that the performances are much better in larger simulations

DavisVaughan commented 2 years ago

That is good to see. You also have to consider that:

  1. It takes time to actually send data to and from the workers. So with small tests that often dominates the timing.
  2. Starting up the workers themselves takes some time. i.e. this line is in your benchmark, but starting up the 8 workers probably takes 2-3 seconds future::plan(future::multisession, workers = 8)