BIMIB-DISCo / TRONCO

Repository of the TRanslational ONCOlogy library, which includes various algorithms (such as CAPRESE and CAPRI) and the Pipeline for Cancer Inference (PICNIC).
https://bimib-disco.github.io/TRONCO
GNU General Public License v3.0
28 stars 7 forks source link

Remove cluster/creation destruction from within functions (and use makeForkCluster in POSIX systems) #73

Closed rdiaz02 closed 6 years ago

rdiaz02 commented 8 years ago

makeCluster is used in the code. Would it be possible to use, on Unix systems, makeForkCluster (or makeCluster with type=FORK)?

Why? Using makeForkClustercreates worker processes by forking. This is often considerably faster than the PSOCK approach (which calls Rscript, etc) and, in addition, can lead to considerable savings in memory (as child process's virtual memory share the same physical memory until there are changes blablabla). The last feature would allow running clusters with many more processes (maybe up to the number of cores/processors). Conditionally using one type of cluster or another (depending on whether the OS is Windows or POSIX) is straightforward.

rdiaz02 commented 8 years ago

The changes above would work fine (I have tried it), unless one runs a bunch of processes one after the other. In this case, one runs out of open file descriptors (the "unable to create a pipe" message) because any run of tronco.bootstrap et al will leave a set of R zombies with their corresponding pipes, etc.

The solution is to create the cluster (and register it) once per session, not every time the functions are run (i.e., remove the cluster creation/registration/destruction from inside the functions).

In other words, if the following two lines

cl = makeCluster(cores, type = "FORK")    
registerDoParallel(cl)

as well as the call to stopCluster(cl) are removed from statistics.R and bootstrap.R then everything works fine. The user creates the cluster as he or she wants, registers it, and we are done.