SydneyBioX / scMerge

Statistical approach for removing unwanted variation from multiple single-cell datasets
https://sydneybiox.github.io/scMerge/
66 stars 13 forks source link

how to accelarate the scMerge2 process #39

Open xflicsu opened 10 months ago

xflicsu commented 10 months ago

I use scMerge2 to integrate about 150K cells. Now, it costed about 5 hours and still run the "Running RUV" step with 120 CPU. I wonder how to accelarate the process like your paper mentioned? Thanks!

################## scMerge2_res <- scMerge2(exprsMat = logcounts(sce), batch = sce$orig.ident,condition=sce$type,chosen.hvg=hgvs,return_matrix = FALSE, verbose = TRUE,use_bpparam = BiocParallel::SerialParam() ) [1] "Cluster within batch" [1] "Normalising data" [1] "Constructing pseudo-bulk" Dimension of pseudo-bulk expression: [1] 33341 16083 [1] "Identifying MNC using pseudo-bulk:" [1] "condition_mode" [1] "Running RUV"

YingxinLin commented 10 months ago

Hi thank you for your interest in scMerge2.

I am wondering how many batches and conditions do you have for your dataset? (assuming condition of the sample is included as sce$type here). If you want to run scMerge2 in parallel, you can set use_bpparam = BiocParallel::MulticoreParam(workers = ncores).

Best wishes, Yingxin

xflicsu commented 10 months ago

Hi thank you for your interest in scMerge2.

I am wondering how many batches and conditions do you have for your dataset? (assuming condition of the sample is included as sce$type here). If you want to run scMerge2 in parallel, you can set use_bpparam = BiocParallel::MulticoreParam(workers = ncores).

Best wishes, Yingxin

Thanks for your quick response!

I have 150k cells with 40 samples (sce$orig.ident) and 3 conditions (sce$type). The process work in parallel.

DarioS commented 7 months ago

Did you replace use_bpparam = SerialParam() by use_bpparam = MulticoreParam(workers = ncores) in the end?