ChenWeiyan / LandSCENT

Landscape Single Cell Entropy
19 stars 6 forks source link

Out of memory #5

Closed jmzvillarreal closed 5 years ago

jmzvillarreal commented 5 years ago

Hi Weiyan, I am trying to run a dataset of 13 000 cells in a cluster, assigning 200 cores and 1TB of memory and it is not enough still ... Is this normal ?? Could you specify the characteristics of the machine you ran those 100k cells ? Thanks in advance. Best, Jaime.

ChenWeiyan commented 5 years ago

Hi Jaime, I ran 100k cells with 50 cores and 1TB of memory in a cluster with Intel Xeon CPU. If it went with "Out of memory" error, then there must be something wrong. It could be the way you assign the cores, or the command you use? Could you show me your code of running LandSCENT? Maybe I can then spot where the problem is. Best, Weiyan

jmzvillarreal commented 5 years ago

Hi Weiyan, Thanks for uour quick answer. This is my R script: library(dplyr) library(ggplot2) library(Seurat) library(LandSCENT) library(STRINGdb) library(igraph) library(biomaRt)

load('ppi.rda') load('Total.rda')

exp_total <- Total[["RNA"]]@counts total_meta <- read.table('Total_status_metadata.csv', header = TRUE, row.names = 1, sep = '\t')

library(scater)

exp_total <- SingleCellExperiment(assay = list(counts = exp_total)) metadata(exp_total) <- total_meta counts(exp_total) <- as(counts(exp_total), "dgCMatrix") sizeFactors(exp_total) <- librarySizeFactors(exp_total) exp_total <- normalize(exp_total, log_exprs_offset = 1.1)

Differentiation potency estimation

Integration.total <- DoIntegPPI(exp.m = exp_total, ppiA.m = ppi) str(Integration.total)

SR.total <- CompSRana(Integration.total, local = TRUE, mc.cores = 200)

pdf('boxplotSR_total.pdf') boxplot(SR.total$SR ~ total_meta$Status, main = "SR values against status", xlab = "Status", ylab = "SR values",outline=FALSE) stripchart ( SR.total$SR ~ total_meta$Status, vertical = TRUE, method = "jitter", add = TRUE, pch = 21, col = 'grey28') dev.off()

Infer the potency states in a cell population

InferPotency.total <- InferPotency(SR.total, pheno.v = total_meta$Status) InferPotency.total$distPSPH

Infer potency-coexpression clusters (landmarks)

InferLandmark.total <- InferLandmark(InferPotency.total, pheno.v = total_meta$Status, reduceMethod = "tSNE", clusterMethod = "PAM", k_pam = 3)

Density based visualization tool

This function generates figures which compares cell density across different distinct potency states

pdf('PS_total.pdf') LandSR.total <- Plot_LandSR(InferLandmark.total, coordinates = Total@reductions$umap@cell.embeddings, colpersp = NULL, colimage = NULL, bty = "f", PDF = FALSE) dev.off()

And this is the command I use to run it in the cluster: sbatch -o test.log -e test.err -J LandSCENT -c 200 -t 1440 --mem=1000G --wrap "Rscript LandSCENT_total_leia.R"

Thanks very much! Best, Jaime.

ChenWeiyan commented 5 years ago

Hi Jaime, From the script you provided, I think the error probably caused by the number of cores. According to the parallel function of "mclapply" I used in LandSCENT, it will generate multiple child processes based on the number of cores. And every child process will occupy the same amount of memory. Since you are dealing with 13k cells, and according to my experience, every child process will probably occupy ~20G memory. So the 200 cores could add up to ~4000G. It will then lead to out of memory problem if you just apply 1TB memory. I am not sure if it is the actual problem, but I think you can try to reduce the number of cores, say 80 cores. It won't take you too long to run the calculation. Of course you can also check the memory of the R process, and try to apply more memory. Best, Weiyan

jmzvillarreal commented 5 years ago

Hi Weiyan, You were right ! With 80 cores and 1TB it worked !! Many thanks ! Jaime.