Closed hr1912 closed 5 years ago
Hello,
Excuse me for the late reply. My closest experience to your case is a UMI dataset with ~20000 genes and 4500 cells, and it finished within 1 hour with 30 cores.
Did you manage to obtain the results, and if yes, how long did it take?
Hi Vivian, Thanks for your reply. I have not got results yet cause it is still running 😜. I think it is stuck on calculating distances between cells.
I am pasting the log and r session info below for your information:
1 [1] "reading in raw count matrix ..."
2 [1] "number of genes in raw count matrix 6104"
3 [1] "number of cells in raw count matrix 40534"
4 [1] "reading finished!"
5 [1] "imputation starts ..."
6 [1] "searching candidate neighbors ... "
7 [1] "inferring cell similarities ..."
8 [1] "dimension reduction ..."
9 [1] "calculating cell distances ..."
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux
Matrix products: default
BLAS: /extraspace/hruan/softs/R-3.5.1/lib64/R/lib/libRblas.so
LAPACK: /extraspace/hruan/softs/R-3.5.1/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] scImpute_0.0.8 doParallel_1.0.11 iterators_1.0.10 foreach_1.4.4
[5] penalized_0.9-51 survival_2.42-3
loaded via a namespace (and not attached):
[1] compiler_3.5.1 Matrix_1.2-14 rsvd_0.9 Rcpp_0.12.18
[5] codetools_0.2-15 splines_3.5.1 grid_3.5.1 kernlab_0.9-26
[9] lattice_0.20-35
Given you have over 40,000 cells, it is expected to run for a longer time, but it's surprising that it's still at the stage of calculating cell similarities. I have updated the package to make this step faster, but in your case, can you let me know if you are using an independent server, or using computing nodes from a cluster?
Hi Vivian,
We are using an independent server (RHEL 7) with 48 core of CPU and 512 RAM.
Thanks for the information. I would expect it to run faster on your platform, but let me do some experiments on my side to check.
Hi Vivian,
I am trying to impute dropouts from a csv of UMI values (around 40000 genes and 6000 cells).
The codes are listed below.
scImpute::scimpute(count_path = "all_umi_raw.csv", infile = "csv", outfile = "csv", out_dir = "test_scimpute", labeled = F, drop_thre = .5, Kcluster = 5, ncores = 10)
It takes over 60 gigabytes of ram and is running slowly. Is that normal? Can I make this faster?
Thanks!