Closed rdiaz02 closed 6 years ago
The changes above would work fine (I have tried it), unless one runs a bunch of processes one after the other. In this case, one runs out of open file descriptors (the "unable to create a pipe" message) because any run of tronco.bootstrap et al will leave a set of R zombies with their corresponding pipes, etc.
The solution is to create the cluster (and register it) once per session, not every time the functions are run (i.e., remove the cluster creation/registration/destruction from inside the functions).
In other words, if the following two lines
cl = makeCluster(cores, type = "FORK")
registerDoParallel(cl)
as well as the call to stopCluster(cl)
are removed from statistics.R
and bootstrap.R
then everything works fine. The user creates the cluster as he or she wants, registers it, and we are done.
makeCluster
is used in the code. Would it be possible to use, on Unix systems,makeForkCluster
(ormakeCluster
withtype=FORK
)?Why? Using
makeForkCluster
creates worker processes by forking. This is often considerably faster than the PSOCK approach (which calls Rscript, etc) and, in addition, can lead to considerable savings in memory (as child process's virtual memory share the same physical memory until there are changes blablabla). The last feature would allow running clusters with many more processes (maybe up to the number of cores/processors). Conditionally using one type of cluster or another (depending on whether the OS is Windows or POSIX) is straightforward.