Closed abhsarma closed 2 years ago
@mjskay I remember discussing about this and decided that at some point we'd want users be able to farm out execution to clusters.
Currently the code uses forking and is set up as:
execute_multiverse <- function(...) {
## ... some processing steps ...
lapply( [list_of_code_blocks], exec_all_universe, cores = cores )
}
exec_all_universe <- function(...) {
## ... some processing steps ...
mcmapply( execute_code_from_universe, .code_list, .env_list, mc.cores = cores )
}
I am thinking of modifying the 2nd function to something like, where we provide some guideline on how they might declare the function for parallel execution, and users can set up some cluster that they like:
exec_all_universe <- function(..., parallel_fun = NULL) {
## ... some processing steps
if (is.null(parallel_fun)) {
mcmapply( execute_code_from_universe, .code_list, .env_list, mc.cores = cores )
} else {
## call the parallel_fun on the objects
}
}
I'd say if you want people to be able to swap out backends for parallelization, it might be better to use an existing API designed for this purpose, like {foreach} or {future.apply}. That way it will fit into existing parallel processing frameworks with minimal effort on users' part. Saves them having to figure out what kind of function your package expects.
e.g. see here: https://cran.r-project.org/web/packages/future.apply/vignettes/future.apply-1-overview.html
right, but setting up future is such a pain. I feel it adds too much complexity for people who are on UNIX systems?
oh this is different. I'll see if this has the same issues
If future is annoying, maybe try foreach?
Related to #54
Currently, the implementation of
execute_multiverse
where the universes are executed in parallel is being done usingmcmapply
from the parallel package. Switch tofurrr::future_map2
. (This should be straightforward, seems like we just need to replace the function)Also consider how we might allow users to execute on clusters? Should there be an alternate pipeline where the user needs to basically write their own function for execution? That might require modularising the current implementation...