HelenaLC / muscat

Multi-sample multi-group scRNA-seq analysis tools
160 stars 32 forks source link

How to set BPPARAM to run parallel #70

Open lindsaynhayes opened 3 years ago

lindsaynhayes commented 3 years ago

How do you specify the BPPARAM argument to make mmDS to run parallel? The default is set as BPPARAM = SerialParam(progressbar = verbose) which means it going to run serial right? And when I run it I see the iterations ticking by (<3 verbose). But for one coef it is going to take over 7 hours so I was hoping to speed it up. I've been reading the BPPARAM vignette , but I'm still confused. Below is what I've tried. (http://bioconductor.org/packages/release/bioc/vignettes/BiocParallel/inst/doc/Introduction_To_BiocParallel.pdf)

library('BiocParallel')
registered() # I have MulticoreParam capabilities
options(MulticoreParam=quote(MulticoreParam(workers=12))) # from page 3
param <- MulticoreParam(workers = 12) # from page 5

res_MC <- mmDS(sce, coef = "group_idCoef2", method = "dream", verbose = TRUE, BPPARAM = MulticoreParam())

res_MC <- mmDS(sce, coef = "group_idCoef2", method = "dream", verbose = TRUE, BPPARAM = MulticoreParam(workers = 8))

res_MC <- mmDS(sce, coef = "group_idCoef2", method = "dream", verbose = TRUE, BPPARAM = param)

# they all give an error with some version of 
 unused argument (BPPARAM = MulticoreParam())

Thanks for any advice. I'm still trying to get a handle on parallel processing.

plger commented 3 years ago

There seems to be nothing wrong with what you're doing, but I can't reproduce the bug neither in version 1.6.0 (Bioc) or 1.7.1 (Bioc-devel). Are you perhaps using an older version?

lindsaynhayes commented 3 years ago

You are correct. I was assuming I had the syntax wrong. I was running muscat_1.4.0. Upgraded to 1.6

Now I have a new one. currently googling... will get back... Error in serialize(data, node$con, xdr = FALSE) : error writing to connection

Update: I played around with the number of workers and it seems to fix it. It spits the error at the beginning but proceeds with running the iterations rather than quitting. It is running the iterations slower, but I will up the cores and try again. For example, I have a 12 core allocation (with 20 GB per core) and workers = 12 output the error, workers = 6 also output the error, but workers = 3 was successful. I set my MulticoreParam to `log=TRUE' and this is the output of one iteration. It seems like 2 cores (40 GB) per worker should be sufficient memory so I'm not sure why 6 workers didn't work. The memory used means it is maximally using 22GB right?

Testing 3221 genes across 36329 cells in cluster “0”...
Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection
[1] "~group_id+(1|sample_id)"
Dividing work into 100 chunks...
iteration: 1
############### LOG OUTPUT ###############
Task: 1
Node: 1
Timestamp: 2021-06-23 11:15:24
Success: TRUE

Task duration:
   user  system elapsed 
 17.071   1.872  47.308 

Memory used:
             used    (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells    8721680   465.8   15711907   839.2   12541152   669.8
Vcells 1652270592 12605.9 2881445636 21983.7 2881440899 21983.7

Log messages:
INFO [2021-06-23 11:13:57] loading futile.logger package

stderr and stdout:
plger commented 3 years ago

On what OS are you running this? Depending on the platform BiocParallel sometimes has issues, especially when running it interactively from something like Rstudio, so you might try from the CLI. Otherwise I'm afraid dream has a big memory consumption, though I don't see why your setup wouldn't be enough...

lindsaynhayes commented 3 years ago

Short version: problems mostly solved. More details below that you might find informative. Lingering question: Deseq with vst didn't run (point 4) because of S4 data type?

  1. I am using CLI from a HPC to (try to) run Parallel. Here is the abbreviated sessionInfo. ( I can send more, but didn't think you really needed all the package info)
    
    > sessionInfo()
    R version 4.1.0 (2021-05-18)
    Platform: x86_64-conda-linux-gnu (64-bit)
    Running under: CentOS Linux 7 (Core)

[13] muscat_1.6.0

2. The serial ended up being faster. I played around with 4, 6, 8 workers and the speed was pretty much the same for each. Though with `log=TRUE` I could see the different cores running each iteration. But I ended up getting `dream` and `vst` with `sctransform` to work in serial and stopped trying with the parallel. 

3. `dream2` was running slower than `dream` so I didn't wait for that one to finish. For `dream2` each cluster was running 2 rounds of 100 iterations, is that normal? While `dream` was just running one set of 100 iterations for each cluster. 

res_MC <- mmDS(sce, coef = "group_idCoef2", covs = NULL, method = "dream2", n_cells = 25, n_samples = 2, min_count = 5, min_cells = 20, verbose = TRUE, BPPARAM = param) Skipping cluster(s) “14”, “15”, “16” due to an insufficient number of samples with a sufficient number of cells. Testing 3221 genes across 36329 cells in cluster “0”...
[1] "~(1|sample_id)+group_id" Memory usage to store result: >10.5 Gb Dividing work into 100 chunks... # first set of iterations iteration: 24 55 100

Total:2412 s Dividing work into 100 chunks... # second set of iterations iteration: 2


4. `DESeq2` with `vst` also wasn't working, but I'll admit I haven't done as much troubleshooting with this one yet.

res_MC <- mmDS(sce, coef = "group_idCoef2", covs = NULL, method = "vst", vst = "DESeq2", n_cells = 25, n_samples = 2, min_count = 5, min_cells = 20, verbose = TRUE) Skipping cluster(s) “14”, “15”, “16” due to an insufficient number of samples with a sufficient number of cells. Error in mde(x) : cannot coerce type 'S4' to vector of type 'integer'


5. Since we're here can I have a bonus unrelated question? Often I get kicked off my ssh with a `client_loop: send disconnect: Broken pipe` error. Do you have any idea how I can avoid this? I can't tell why it is happening. If I look away it seem to happen but if I babysit the script and hit enter every now and again it seems to happen less. The best idea I found on google was updating the .ssh/config file to include a `ServerAliveInterval` but then I couldn't even connect to the ssh. If you have no idea what I'm talking about no big deal.

Thanks 💯