Closed afinneg2 closed 6 years ago
Hi @afinneg2 ,
I haven't seen the behavior that you describe and it might indeed be an issue of BiocParallel, since the only thing that I do in zinbFit
is to pass the argument BPPARAM to bplapply
.
I only have a couple of suggestions to try and figure out if this is a problem with zinbwave or BiocParallel:
BiocParallel::register(BiocParallell:MulticoreParam(workers=cores))
zinbFit(cur.se, K=20, X="~nUMI + sample + percent.mito", verbose = TRUE)
and see if the behavior remains the same.
bplapply
, e.g.bplapply(seq_len(10), function(x) {
Sys.sleep(1)
rnorm(100)
}, BPPARAM = MulticoreParam(workers = cores))
and see if the behavior is the same. If 2, then it's for sure BiocParallel. If 1, perhaps there's something going on in the way zinbFit
passes BPPARAM to bplapply
.
A heads up on this issue is if your cluster has installed an optimized BLAS implementation (e.g. openblas or mkl) much of the matrix work may be parallelized automatically, over and above any explicit parallelization from BiocParallel. Ask your sysadmin.
@Simon-Coetzee and @drisso , Thank you very much for your advice and help. I agree with @Simon-Coetzee diagnosis. I have found the following to work in my cluster environment
nThreads=2
export OPENBLAS_NUM_THREADS=$nThreads OMP_NUM_THREADS=$nThreads MKL_NUM_THREADS=$nThreads
Rscript run_zinb_setWorkers.R
where in run_zinb_setWorkers.R, in addition to running zinbwave, I set
BiocParallel::register(BiocParallell:MulticoreParam(workers=nWorkers))
I choose nWorkers*nThreads = number cores to use -2 This seems to work. There is not problem of exploding load averages but I still have zombie processes generated .
Thanks again.
Hello,
Thank you for this very useful package! I am running into trouble using zinbWave on a cluster environment with a slurm scheduler.
Briefly, my issue is that the command: zinbFit(cur.se, K=20, X="~nUMI + sample + percent.mito", verbose = TRUE, BPPARAM=MulticoreParam(workers=cores))
seems to always use all cores on the node, regardless of how many are requested by the value of the cores variable. For example, when I set cores=1, I get load averages on the linux cluster equal to 24 (the number of cores on the node). When I set cores to any value greater than 1 , I get load averages that continue to grow in excess of number of available cores and lots of zombie processes and I have to kill the job or risk crashing the node.
I understand the issue could be with the biocParallel package or specific to the setup of my cluster. Nevertheless I am wondering if you have encountered an a similar problem or if you could recommend a biocParallel setup that works for zinbwave on clusters managed by SLURM.
I greatly appreciate any help !