Closed Saethox closed 1 year ago
The parallel-irace
script does a bit more than execute multiple runs in parallel. It also takes care of the random seed, creating the exec-dirs, etc.
I would be happy to review and merge a function irace_parallel(runs)
or maybe multiple_runs_irace(scenario, parameters, nruns=2, parallel=TRUE)
. You could start with the parallelization provided by the parallel
package (https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf). We already use it within irace (https://github.com/MLopez-Ibanez/irace/blob/7513fef57f845c195d15b469d1e5ccd9dd706598/R/race-wrapper.R#L561-L599).
It would be great if most of the code in parallel-irace could be moved to R.
If I understand this correctly, on platforms other than Windows the irace
runs should just be parallelizable with parallel::mclapply
, which, according to the documentation, allows nested calls by default (mc.allow.recursive = TRUE
).
Apparently, nested parallel::parLapply
does work, although it might be with considerable overhead (https://stackoverflow.com/questions/50938117/r-parallel-clusters-inside-a-cluster).
I would start even simpler. Create first a function that does sequentially what you want to do in parallel. I would be happy to review and merge that function even without the parallel part. Then make that function work in parallel.
For the parallel part, I would suggest to try with parallel::parLapply
mclapply
as irace itself does for a first implementation. I don't even know if there are some unknown issues when running irace
in this way, so you may want to have a working version first that is doing what you want, then make it faster.
If you are not happy with the performance, you could try the future package, but I haven't investigated it myself, so I don't know how easy is to use.
There is yet another third option: https://callr.r-lib.org/#multiple-background-r-processes-and-poll
I have merged but I also did a few minor changes after the merging. In particular, I have moved the function to its own file, since I expect it to keep growing when you implement the parallel version.
I also do not think the handling of random seeds is completely correct, since gen_random_seeds(10)
returns a list of 70 values, which does not seem correct. But comparing the parallel versus sequential variants will probably shed more light into this aspect.
Looking forward to the next part!
I also do not think the handling of random seeds is completely correct, since
gen_random_seeds(10)
returns a list of 70 values, which does not seem correct. But comparing the parallel versus sequential variants will probably shed more light into this aspect.
Yeah, looks like I'm concatenating the lists incorrectly, the seed generated by nextRNGStream
is a list of seven values. I also should set the random seed directly and reset the scenario$seed
to NA
, because irace
expects a single positive integer as seed, and not seven integers.
How should we approach the parallel execution? The execute.experiments
function is currently hard coded to the scenario
options and the global .irace$target.runner
, but a parallel implementation of multi_irace
would probably share 90% of the code.
Do you see any pitfalls with adapting execute.experiments
to allow executing arbitrary functions either sequentially, with multi-threading, mpi, other clusters, etc.? Then we could use it inside both irace
and multi_irace
. Depending on how hard it is to get the cluster code to work, I would also be fine with just extracting the multi-threading part of execute.experiments
.
Please, duplicate/copy the parts that you need and do not modify execute.experiments
. Once everything is working, if there is some duplication remaining, we can look at creating common functions.
Yeah, looks like I'm concatenating the lists incorrectly, the seed generated by
nextRNGStream
is a list of seven values. I also should set the random seed directly and reset thescenario$seed
toNA
, becauseirace
expects a single positive integer as seed, and not seven integers.
Would that allow someone to repeat individual runs knowing the seed?
I think it may be easier to randomly generate as many integers randomly (or sequentially starting from global_seed) as seeds are needed, then set the scenario$seed
for each of them and do not change the RNGkind at all. This way each run of irace will work exactly as if the user had specified the seed that is recorded. It is true that the independent runs are not technically independent any longer because their RNG streams are related by the relation of their seeds, but I doubt that in the context of irace this effect can be measured at all. In fact, when running multiple runs of irace, I often set the seed to 42+run_i
.
When executing
irace
from the shell, you can useparallel-irace
to parallelize multiple runs of irace in addition to parallelizing executions of thetargetRunner
within a single run, right? Would it be possible to provide airace_parallel(runs)
R function that does the same, whereruns
is a list of tuples ofscenario
andparameters
? I'm not familiar with how the parallelization in R works, so I don't know how much work something like this might be. I personally would already be happy with executing multiple runs on multiple threads, I have no need for MPI or Slurm etc.If this is something that should not be part of this repository, I would also be happy with some pointers to try and implement it myself.