Open lindsaynhayes opened 3 years ago
There seems to be nothing wrong with what you're doing, but I can't reproduce the bug neither in version 1.6.0 (Bioc) or 1.7.1 (Bioc-devel). Are you perhaps using an older version?
You are correct. I was assuming I had the syntax wrong. I was running muscat_1.4.0. Upgraded to 1.6
Now I have a new one. currently googling... will get back...
Error in serialize(data, node$con, xdr = FALSE) : error writing to connection
Update:
I played around with the number of workers and it seems to fix it. It spits the error at the beginning but proceeds with running the iterations rather than quitting. It is running the iterations slower, but I will up the cores and try again. For example, I have a 12 core allocation (with 20 GB per core) and workers = 12 output the error, workers = 6 also output the error, but workers = 3 was successful. I set my MulticoreParam
to `log=TRUE' and this is the output of one iteration. It seems like 2 cores (40 GB) per worker should be sufficient memory so I'm not sure why 6 workers didn't work. The memory used means it is maximally using 22GB right?
Testing 3221 genes across 36329 cells in cluster “0”...
Error in serialize(data, node$con, xdr = FALSE) :
error writing to connection
[1] "~group_id+(1|sample_id)"
Dividing work into 100 chunks...
iteration: 1
############### LOG OUTPUT ###############
Task: 1
Node: 1
Timestamp: 2021-06-23 11:15:24
Success: TRUE
Task duration:
user system elapsed
17.071 1.872 47.308
Memory used:
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 8721680 465.8 15711907 839.2 12541152 669.8
Vcells 1652270592 12605.9 2881445636 21983.7 2881440899 21983.7
Log messages:
INFO [2021-06-23 11:13:57] loading futile.logger package
stderr and stdout:
On what OS are you running this? Depending on the platform BiocParallel sometimes has issues, especially when running it interactively from something like Rstudio, so you might try from the CLI. Otherwise I'm afraid dream
has a big memory consumption, though I don't see why your setup wouldn't be enough...
Short version: problems mostly solved. More details below that you might find informative. Lingering question: Deseq with vst didn't run (point 4) because of S4 data type?
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
[13] muscat_1.6.0
2. The serial ended up being faster. I played around with 4, 6, 8 workers and the speed was pretty much the same for each. Though with `log=TRUE` I could see the different cores running each iteration. But I ended up getting `dream` and `vst` with `sctransform` to work in serial and stopped trying with the parallel.
3. `dream2` was running slower than `dream` so I didn't wait for that one to finish. For `dream2` each cluster was running 2 rounds of 100 iterations, is that normal? While `dream` was just running one set of 100 iterations for each cluster.
res_MC <- mmDS(sce, coef = "group_idCoef2", covs = NULL, method = "dream2", n_cells = 25, n_samples = 2, min_count = 5, min_cells = 20, verbose = TRUE, BPPARAM = param)
Skipping cluster(s) “14”, “15”, “16”
due to an insufficient number of samples with a sufficient number of cells.
Testing 3221 genes across 36329 cells in cluster “0”...
[1] "~(1|sample_id)+group_id"
Memory usage to store result: >10.5 Gb
Dividing work into 100 chunks... # first set of iterations
iteration: 24
55
100
Total:2412 s Dividing work into 100 chunks... # second set of iterations iteration: 2
4. `DESeq2` with `vst` also wasn't working, but I'll admit I haven't done as much troubleshooting with this one yet.
res_MC <- mmDS(sce, coef = "group_idCoef2", covs = NULL, method = "vst", vst = "DESeq2", n_cells = 25, n_samples = 2, min_count = 5, min_cells = 20, verbose = TRUE) Skipping cluster(s) “14”, “15”, “16” due to an insufficient number of samples with a sufficient number of cells. Error in mde(x) : cannot coerce type 'S4' to vector of type 'integer'
5. Since we're here can I have a bonus unrelated question? Often I get kicked off my ssh with a `client_loop: send disconnect: Broken pipe` error. Do you have any idea how I can avoid this? I can't tell why it is happening. If I look away it seem to happen but if I babysit the script and hit enter every now and again it seems to happen less. The best idea I found on google was updating the .ssh/config file to include a `ServerAliveInterval` but then I couldn't even connect to the ssh. If you have no idea what I'm talking about no big deal.
Thanks 💯
How do you specify the
BPPARAM
argument to makemmDS
to run parallel? The default is set asBPPARAM = SerialParam(progressbar = verbose)
which means it going to run serial right? And when I run it I see the iterations ticking by (<3 verbose). But for one coef it is going to take over 7 hours so I was hoping to speed it up. I've been reading the BPPARAM vignette , but I'm still confused. Below is what I've tried. (http://bioconductor.org/packages/release/bioc/vignettes/BiocParallel/inst/doc/Introduction_To_BiocParallel.pdf)Thanks for any advice. I'm still trying to get a handle on parallel processing.