MUCollective / multiverse

R package for creating explorable multiverse analysis
https://mucollective.github.io/multiverse/
GNU General Public License v3.0
62 stars 5 forks source link

implementation of parallel execution using `future` and `clusterMap` #89

Closed abhsarma closed 2 years ago

abhsarma commented 3 years ago

Related to #54

Currently, the implementation of execute_multiverse where the universes are executed in parallel is being done using mcmapply from the parallel package. Switch to furrr::future_map2. (This should be straightforward, seems like we just need to replace the function)

Also consider how we might allow users to execute on clusters? Should there be an alternate pipeline where the user needs to basically write their own function for execution? That might require modularising the current implementation...

abhsarma commented 3 years ago

@mjskay I remember discussing about this and decided that at some point we'd want users be able to farm out execution to clusters.

Currently the code uses forking and is set up as:

execute_multiverse <- function(...) {
    ## ... some processing steps ...
    lapply( [list_of_code_blocks], exec_all_universe, cores = cores )
}

exec_all_universe <- function(...) {
    ## ... some processing steps ...
    mcmapply( execute_code_from_universe, .code_list, .env_list, mc.cores = cores )
}

I am thinking of modifying the 2nd function to something like, where we provide some guideline on how they might declare the function for parallel execution, and users can set up some cluster that they like:

exec_all_universe <- function(..., parallel_fun = NULL) {
    ## ... some processing steps
    if (is.null(parallel_fun)) {
        mcmapply( execute_code_from_universe, .code_list, .env_list, mc.cores = cores )
    } else {
        ## call the parallel_fun on the objects 
    }
}
mjskay commented 3 years ago

I'd say if you want people to be able to swap out backends for parallelization, it might be better to use an existing API designed for this purpose, like {foreach} or {future.apply}. That way it will fit into existing parallel processing frameworks with minimal effort on users' part. Saves them having to figure out what kind of function your package expects.

e.g. see here: https://cran.r-project.org/web/packages/future.apply/vignettes/future.apply-1-overview.html

abhsarma commented 3 years ago

right, but setting up future is such a pain. I feel it adds too much complexity for people who are on UNIX systems?

abhsarma commented 3 years ago

oh this is different. I'll see if this has the same issues

mjskay commented 3 years ago

If future is annoying, maybe try foreach?