EuracBiomedicalResearch / FamAgg

This is the development version of the FamAgg Bioconductor package.
https://EuracBiomedicalResearch.github.io/FamAgg
MIT License
0 stars 2 forks source link

Parallelization support #3

Open the-x-at opened 7 years ago

the-x-at commented 7 years ago

Support for parallel simulation computation on a multiprocessor/multicore machine would be great. Limiting the number of cores used should be an optional parameter when running simulations, ideally defaulting to a single job, as many queuing systems have their own load balancing and discourage use of multiple cores for a single job.

jorainer commented 7 years ago

parallel random sampling might be tricky but eventually there might be something in BiocParallel.

the-x-at commented 3 years ago

Five years gone and not much happened. In the meantime, fixing issue #22 splits the whole simulation into small chunks of short simulations. This would theoretically be a possibility to add parallelization. OTH, this type of running threads in parallel is not appreciated by queuing systems like SLURM, as you gain an advantage over other by using multiple cores. Unless SLURM is informed about this, it will kill the job assuming excess CPU usage.

jorainer commented 3 years ago

parallel processing with SLURM works like a marvel with:

ncores <- as.integer(Sys.getenv("SLURM_JOB_CPUS_PER_NODE", 7)) - 1L
register(MulticoreParam(ncores))

any subsequent call to bplapply will then by default use the parallel processing setup with the number of nodes assigned by SLURM. The main issue I see is with the random numbers - we would have to ensure that not the same random numbers are picked up in the parallel jobs. Anyway, since we're running FamAgg on multiple traits in one analysis, parallelizing by trait is at present my favorite approach.

the-x-at commented 3 years ago

Looks very simple, and it also looks like one has to supply the number of CPUs (cores/threads) used by a single process when submitting a job to SLURM via sbatch -c N, where N is the number of threads you want to use. This parameter then ends up in the environment variable SLURM_JOB_CPUS_PER_NODE. Anyway, at the moment we don't see much need for this. So the issue will remain open but no plans to tackle it.

jorainer commented 3 years ago

yep exactly, sbatch -c N is assigned the environment variable SLURM_JOB_CPUS_PER_NODE (I guess also the other variables will be available). And yes, I agree, no need to implement something at present.