Closed jmzvillarreal closed 5 years ago
Hi Jaime, I ran 100k cells with 50 cores and 1TB of memory in a cluster with Intel Xeon CPU. If it went with "Out of memory" error, then there must be something wrong. It could be the way you assign the cores, or the command you use? Could you show me your code of running LandSCENT? Maybe I can then spot where the problem is. Best, Weiyan
Hi Weiyan, Thanks for uour quick answer. This is my R script: library(dplyr) library(ggplot2) library(Seurat) library(LandSCENT) library(STRINGdb) library(igraph) library(biomaRt)
load('ppi.rda') load('Total.rda')
exp_total <- Total[["RNA"]]@counts total_meta <- read.table('Total_status_metadata.csv', header = TRUE, row.names = 1, sep = '\t')
library(scater)
exp_total <- SingleCellExperiment(assay = list(counts = exp_total)) metadata(exp_total) <- total_meta counts(exp_total) <- as(counts(exp_total), "dgCMatrix") sizeFactors(exp_total) <- librarySizeFactors(exp_total) exp_total <- normalize(exp_total, log_exprs_offset = 1.1)
Integration.total <- DoIntegPPI(exp.m = exp_total, ppiA.m = ppi) str(Integration.total)
SR.total <- CompSRana(Integration.total, local = TRUE, mc.cores = 200)
pdf('boxplotSR_total.pdf') boxplot(SR.total$SR ~ total_meta$Status, main = "SR values against status", xlab = "Status", ylab = "SR values",outline=FALSE) stripchart ( SR.total$SR ~ total_meta$Status, vertical = TRUE, method = "jitter", add = TRUE, pch = 21, col = 'grey28') dev.off()
InferPotency.total <- InferPotency(SR.total, pheno.v = total_meta$Status) InferPotency.total$distPSPH
InferLandmark.total <- InferLandmark(InferPotency.total, pheno.v = total_meta$Status, reduceMethod = "tSNE", clusterMethod = "PAM", k_pam = 3)
pdf('PS_total.pdf') LandSR.total <- Plot_LandSR(InferLandmark.total, coordinates = Total@reductions$umap@cell.embeddings, colpersp = NULL, colimage = NULL, bty = "f", PDF = FALSE) dev.off()
And this is the command I use to run it in the cluster: sbatch -o test.log -e test.err -J LandSCENT -c 200 -t 1440 --mem=1000G --wrap "Rscript LandSCENT_total_leia.R"
Thanks very much! Best, Jaime.
Hi Jaime, From the script you provided, I think the error probably caused by the number of cores. According to the parallel function of "mclapply" I used in LandSCENT, it will generate multiple child processes based on the number of cores. And every child process will occupy the same amount of memory. Since you are dealing with 13k cells, and according to my experience, every child process will probably occupy ~20G memory. So the 200 cores could add up to ~4000G. It will then lead to out of memory problem if you just apply 1TB memory. I am not sure if it is the actual problem, but I think you can try to reduce the number of cores, say 80 cores. It won't take you too long to run the calculation. Of course you can also check the memory of the R process, and try to apply more memory. Best, Weiyan
Hi Weiyan, You were right ! With 80 cores and 1TB it worked !! Many thanks ! Jaime.
Hi Weiyan, I am trying to run a dataset of 13 000 cells in a cluster, assigning 200 cores and 1TB of memory and it is not enough still ... Is this normal ?? Could you specify the characteristics of the machine you ran those 100k cells ? Thanks in advance. Best, Jaime.