infercnv::run seems to run only on single core independent of num_threads

tplehto commented 1 year ago

Hello,

Thank you for this very useful package. I have been running infercnv on several data sets (e.g. dims: 19k x 14k), both dense, and sparse matrices. The runs do get completed, but are extremely slow, independent of the hardware used. I have been trying the package on a 4 core/8 thread laptop with 16 GB of RAM as well as a desktop with 16 cores/32 threads and 64 GB of RAM. The run times for my dataset are almost identical on both machines (around 5-6 hours with HMM = FALSE; tried also HMM = TRUE, but have only finished it once with the tutorial example data).

I have been playing around with num_threads with numbers ranging from the default 4, all the way to 20, but it seems to do nothing about the run times, or CPU utilization, which increases by around 25% using the 4C/8T machine, and by 3.5-5% using the 16C/32T machine (independent of the num_threads value) during infercnv::run. I am attaching the sessionInfo from the laptop for you to see. Here I'm running R 4.2.1 and inferCNV version 1.12.0. On the other computer, I also tried updating all packages, including inferCNV (ver. 1.14) but this did not help. Of note, I am able to run other parallelized loads on both doParallel and Future, and they spawn new R processess, which increase CPU utilization and decrease run times, as expected. With infercnv::run, I am not seeing any spawned daughter processes in the task manager. Also, when running infercnv::run with HMM = TRUE, the console output states "Loading BUGS Model. Running Sampling Using Parallel with 4 Cores" (or whatever the num_threads is set to) but in reality only one core seems to be loaded according to task manager.

A colleague of mine was also able to replicate this issue on their machine, on a recent inferCNV version, but I do not have more details on their setup. The issue is replicatable on the inferCNV tutorial code, and only loading library(inferCNV).

My question would be; which steps are parallelized with num_threads? Is usually a tangible decrease in run times with increasing num_threads? Am I doing something wrong, or is there a problem with parallelization? We are thinking about moving these analyses to a server cluster, which is why I'm asking. Thank you a lot for help on this!

infercnv_obj = infercnv::run(infercnv_obj, cutoff=0.1, # cutoff=1 works well for Smart-seq2, and cutoff=0.1 works well for 10x Genomics out_dir=dir.results, cluster_by_groups=TRUE, denoise=TRUE, HMM=TRUE, num_threads = 4, mask_nonDE_genes = TRUE)

sessionInfo() R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=Finnish_Finland.utf8 LC_CTYPE=Finnish_Finland.utf8 LC_MONETARY=Finnish_Finland.utf8 LC_NUMERIC=C LC_TIME=Finnish_Finland.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] infercnv_1.12.0

loaded via a [1] TH.data_1.1-1 [7] GenomicRanges_1.48.0 [13] mvtnorm_1.1-3 [19] libcoin_1.0-9 [25] rjags_4-13 [31] cli_3.6.0 [37] gtable_0.3.3 [43] Rcpp_1.0.10 [49] timeDate_4022.108 [55] Rmisc_1.5.1 [61] zlibbioc_1.42.0 [67] parallel_4.2.1 [73] yaml_2.3.7 [79] stringi_1.7.12 [85] hardhat_1.2.0 [91] bitops_1.0-7 [97] tidyselect_1.2.0 [103] gplots_3.1.3 [109] fitdistrplus_1.1-8 [115] futile.options_1.0.1 [121] grid_4.2.1 [127] stats4_4.2.1 namespace (and not attached): colorspace_2.1-0 class_7.3-20 modeltools_0.2-23 futile.logger_1.4.3 XVector_0.36.0
rstudioapi_0.14 leiden_0.4.3 listenv_0.9.0 prodlim_2019.11.13 fansi_1.0.4
lubridate_1.9.2 coin_1.4-2 codetools_0.2-18 splines_4.2.1 doParallel_1.0.17
knitr_1.42 jsonlite_1.8.4 pROC_1.18.0 caret_6.0-94 argparse_2.2.2
png_0.1-8 compiler_4.2.1 Matrix_1.5-3 fastmap_1.1.1 limma_3.52.4
formatR_1.14 htmltools_0.5.4 tools_4.2.1 igraph_1.4.1 coda_0.19-4
glue_1.6.2 GenomeInfoDbData_1.2.8 RANN_2.6.1 reshape2_1.4.4 dplyr_1.1.1
Biobase_2.56.0 vctrs_0.6.1 ape_5.7-1 nlme_3.1-157 iterators_1.0.14
gower_1.0.1 xfun_0.37 fastcluster_1.2.3 stringr_1.5.0 globals_0.16.2
timechange_0.2.0 lifecycle_1.0.3 gtools_3.9.4 future_1.32.0 edgeR_3.38.4
MASS_7.3-57 zoo_1.8-11 scales_1.2.1 ipred_0.9-14 MatrixGenerics_1.8.1
SummarizedExperiment_1.26.1 sandwich_3.0-2 lambda.r_1.2.4 RColorBrewer_1.1-3 SingleCellExperiment_1.18.1 reticulate_1.28 gridExtra_2.3 ggplot2_3.4.1 rpart_4.1.16 reshape_0.8.9
HiddenMarkov_1.8-13 S4Vectors_0.34.0 foreach_1.5.2 caTools_1.18.2 BiocGenerics_0.42.0
lava_1.7.2.1 GenomeInfoDb_1.32.4 rlang_1.1.0 pkgconfig_2.0.3 matrixStats_0.63.0
parallelDist_0.2.6 evaluate_0.20 lattice_0.20-45 purrr_1.0.1 recipes_1.0.5
parallelly_1.35.0 plyr_1.8.8 magrittr_2.0.3 R6_2.5.1 IRanges_2.30.1
generics_0.1.3 multcomp_1.4-23 DelayedArray_0.22.0 pillar_1.9.0 withr_2.5.0
survival_3.3-1 RCurl_1.98-1.10 nnet_7.3-17 tibble_3.2.1 future.apply_1.10.0
phyclust_0.1-33 KernSmooth_2.23-20 utf8_1.2.3 rmarkdown_2.20 locfit_1.5-9.7
data.table_1.14.8 ModelMetrics_1.2.2.2 digest_0.6.31 tidyr_1.3.0 RcppParallel_5.1.7
munsell_0.5.0

nbahti commented 1 year ago

I am also interested in this.

GeorgescuC commented 1 year ago

Hi @tplehto ,

The steps that should take advantage of multiple cores are all instances where an hclust is calculated through using parallelDist() to calculate the distance matrix, the random trees permutations, and the Bayesian model.

Regarding specifically the Bayesian model, there is actually a check to see if the OS R is running on is windows or unix-based because at least at the time where that piece of code was written the following was true for doParallel/foreach: The multicore functionality supports multiple workers only on those operating systems that support the fork system call; this excludes Windows source: https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf This is however something I might need to revisit if things have changed, or there is an alternative. This specificity however means that if you tried running infercnv on a server that runs some variant of linux, the parallelization should work.

There is more work that can be done to improve speed, but other improvements have taken priority so far. For sparse matrices specifically, there are some improvements I might be able to port from other code.

Regards, Christophe.

Dacy34 commented 1 year ago

Hi @GeorgescuC Thank you very much for making such a practical R package. I have the same doubts and trouble. I also run inferCNV on windows, and the data is large and small, but I didn't use the subclusters method, and the time lasted for 3 days. I tried subclusters' method the day before yesterday, and it took 48 hours in the seventh step, and still not finish it. And I also set the 'future::plan("multiprocess",workers=16' and 'num_threads = 16', which doesn't seem to work. I don't know if there are any other optimization methods on the window besides using the server or linux system. Best Mengtian

Dacy34 commented 1 year ago

Hi @GeorgescuC I was used 'leiden' method, and it just cost several minutes and then step to STEP 17. This experience is great. Maybe 'leiden' method is the best choice for windows for the time being. Best Mengtian

GeorgescuC commented 1 year ago

Hi @Dacy34 ,

Step 7 is a step only run if the subclustering option is "random_trees", and it is indeed slow even with parallelization because a lot of permutations are generated to check whether the tree should be split or not. This was one of the incentives to add the Leiden subclustering option and set it as default. The "random_trees" method still exists mostly for compatibility and because some downstream analysis uses its hierarchy, while Leiden subclusters are independent.

Regards, Christophe.

Dacy34 commented 1 year ago

Hi @GeorgescuC

Thank you for your reply and explanation. It seems that I still need to know more about the mechanism and principle you mentioned. It's too time-consuming to only use a single core in windows system. I expect inferCNV to have a faster method on windows in the future. Thanks again!

Best， Mengtian.

GeorgescuC commented 1 year ago

Hi @Dacy34 ,

It is something I am hoping to get to, but for now other general fixes/improvements have taken priority. One thing you can try on Windows if to use a WSL (Windows Subsystem Linux) to run infercnv in an ubunutu variant, or to use Docker Desktop. Using the Docker image we provide you can either write an Rscript to run inside of it, or use the script/infercnv.R CLI to call infercnv.

Best, Christophe.

broadinstitute / infercnv

infercnv::run seems to run only on single core independent of num_threads #524