cole-trapnell-lab / monocle3

Other
333 stars 100 forks source link

Learn_graph does not complete - indefinitely hangs/consumes CPU. #367

Closed TJCooperIL closed 4 years ago

TJCooperIL commented 4 years ago

Describe the bug When running learn_graph (from master + devel branches), on the vignette sample data or my own (both small and large subsets), the process does not complete. It definitely runs, consuming 12% of CPU (1 core). On occasion, running learn_graph will cause a fatal error in RStudio.

The process stops here:

Processing louvain component 1
Current partition is 1
Using 797 nodes for principal graph
Finding kNN using RANN with 25 neighbors
Calculating the local density for each sample based on kNNs ...
iter = 1 obj = 352.75751181621

To Reproduce

expression_matrix <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_expression.rds"))
cell_metadata <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_colData.rds"))
gene_annotation <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_rowData.rds"))
cds <- new_cell_data_set(expression_matrix,
                         cell_metadata = cell_metadata,
                         gene_metadata = gene_annotation)
cds <- preprocess_cds(cds, num_dim = 10)
cds <- reduce_dimension(cds)
cds <- cluster_cells(cds)
cds <- learn_graph(cds, verbose=TRUE)

sessionInfo():

sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_IL.UTF-8       LC_NUMERIC=C               LC_TIME=en_IL.UTF-8        LC_COLLATE=en_IL.UTF-8    
 [5] LC_MONETARY=en_IL.UTF-8    LC_MESSAGES=en_IL.UTF-8    LC_PAPER=en_IL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_IL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] monocle3_0.2.1              SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1 DelayedArray_0.14.0        
 [5] matrixStats_0.56.0          GenomicRanges_1.40.0        GenomeInfoDb_1.24.0         IRanges_2.22.2             
 [9] S4Vectors_0.26.1            Biobase_2.48.0              BiocGenerics_0.34.0        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6           plyr_1.8.6             compiler_4.0.0         pillar_1.4.4           XVector_0.28.0        
 [6] viridis_0.5.1          bitops_1.0-6           tools_4.0.0            zlibbioc_1.34.0        viridisLite_0.3.0     
[11] lifecycle_0.2.0        tibble_3.0.1           gtable_0.3.0           lattice_0.20-41        pkgconfig_2.0.3       
[16] rlang_0.4.6            Matrix_1.2-18          rstudioapi_0.11        gridExtra_2.3          GenomeInfoDbData_1.2.3
[21] stringr_1.4.0          dplyr_1.0.0            generics_0.0.2         vctrs_0.3.1            grid_4.0.0            
[26] tidyselect_1.1.0       glue_1.4.1             R6_2.4.1               reshape2_1.4.4         purrr_0.3.4           
[31] ggplot2_3.3.1          magrittr_1.5           scales_1.1.1           ellipsis_0.3.1         colorspace_1.4-1      
[36] stringi_1.4.6          RCurl_1.98-1.2         munsell_0.5.0          crayon_1.3.4    

Additional context Add any other context about the problem here.

brgew commented 4 years ago

Hi, I am unable to reproduce this problem running Monocle3 develop branch on R 4.0.1 on Debian 10 and R 4.0.1 on an Ubuntu 20.04 virtual machine. I am concerned about it and I am interested in knowing if other people see this problem and/or have additional information or insight into it. Thank you.

anthonyxie commented 4 years ago

I'm also having a similar issue where learn_graph fails to complete.

The process stops here: issue5

To reproduce: `library(monocle3) library(dplyr) expression_matrix <- readRDS(url("http://staff.washington.edu/hpliner/data/packer_embryo_expression.rds")) cell_metadata <- readRDS(url("http://staff.washington.edu/hpliner/data/packer_embryo_colData.rds")) gene_annotation <- readRDS(url("http://staff.washington.edu/hpliner/data/packer_embryo_rowData.rds"))

cds <- new_cell_data_set(expression_matrix, cell_metadata = cell_metadata, gene_metadata = gene_annotation)

cds <- preprocess_cds(cds, num_dim = 20); cds <- align_cds(cds, alignment_group = "batch", residual_model_formula_str = "~ bg.300.loading + bg.400.loading + bg.500.1.loading + bg.500.2.loading + bg.r17.loading + bg.b01.loading + bg.b02.loading")

cds <- reduce_dimension(cds)

ciliated_genes <- c("che-1", "hlh-17", "nhr-6", "dmd-6", "ceh-36", "ham-1")

plot_cells(cds, genes=ciliated_genes, label_cell_groups=FALSE, show_trajectory_graph=FALSE)

cds <- cluster_cells(cds)

cds <- learn_graph(cds, verbose=TRUE); sessionInfo(): R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /share/software/user/open/openblas/0.2.19/lib/libopenblasp-r0.2.19.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] dplyr_1.0.0 monocle3_0.2.2.0 SingleCellExperiment_1.8.0 [4] SummarizedExperiment_1.16.1 DelayedArray_0.12.3 BiocParallel_1.20.1
[7] matrixStats_0.56.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.1
[10] IRanges_2.20.2 S4Vectors_0.24.4 Biobase_2.46.0
[13] BiocGenerics_0.32.0

loaded via a namespace (and not attached): [1] ggrepel_0.8.2 Rcpp_1.0.4.6 rsvd_1.0.3
[4] lattice_0.20-38 digest_0.6.25 assertthat_0.2.1
[7] RSpectra_0.16-0 R6_2.4.1 plyr_1.8.6
[10] ggplot2_3.3.2 pillar_1.4.4 zlibbioc_1.32.0
[13] rlang_0.4.6 rstudioapi_0.11 irlba_2.3.3
[16] Matrix_1.2-17 BiocNeighbors_1.4.2 labeling_0.3
[19] stringr_1.4.0 RCurl_1.98-1.2 munsell_0.5.0
[22] uwot_0.1.8 compiler_3.6.1 vipor_0.4.5
[25] BiocSingular_1.2.2 pkgconfig_2.0.3 ggbeeswarm_0.6.0
[28] tidyselect_1.1.0 tibble_3.0.1 gridExtra_2.3
[31] GenomeInfoDbData_1.2.2 batchelor_1.2.4 codetools_0.2-16
[34] viridisLite_0.3.0 crayon_1.3.4 bitops_1.0-6
[37] grid_3.6.1 gtable_0.3.0 lifecycle_0.2.0
[40] magrittr_1.5 scales_1.1.1 stringi_1.4.6
[43] farver_2.0.3 XVector_0.26.0 reshape2_1.4.4
[46] viridis_0.5.1 limma_3.42.2 scater_1.14.6
[49] DelayedMatrixStats_1.8.0 ellipsis_0.3.1 generics_0.0.2
[52] vctrs_0.3.1 RcppAnnoy_0.0.16 tools_3.6.1
[55] glue_1.4.1 beeswarm_0.2.3 purrr_0.3.4
[58] colorspace_1.4-1 `

brgew commented 4 years ago

Hi @anthonyxie, Thank you for the example. Unfortunately, when I use those commands, Monocle3 runs without an error so I cannot debug this problem. For comparison, my sessionInfo() is

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/local/R/R402_sse2/lib/R/lib/libRblas.so
LAPACK: /usr/local/R/R402_sse2/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] dplyr_1.0.0                 monocle3_0.2.2.0            testthat_2.3.2              SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1 DelayedArray_0.14.0        
 [7] matrixStats_0.56.0          GenomicRanges_1.40.0        GenomeInfoDb_1.24.0         IRanges_2.22.1              S4Vectors_0.26.0            Biobase_2.48.0             
[13] BiocGenerics_0.34.0         devtools_2.3.0              usethis_1.6.1              

loaded via a namespace (and not attached):
  [1] backports_1.1.7           plyr_1.8.6                igraph_1.2.5              lazyeval_0.2.2            splines_4.0.2             sp_1.4-2                 
  [7] BiocParallel_1.22.0       listenv_0.8.0             ggplot2_3.3.1             scater_1.16.0             digest_0.6.25             htmltools_0.4.0          
 [13] viridis_0.5.1             gdata_2.18.0              fansi_0.4.1               magrittr_1.5              memoise_1.1.0             limma_3.44.1             
 [19] remotes_2.1.1             globals_0.12.5            gmodels_2.18.1            rsample_0.0.7             prettyunits_1.1.1         colorspace_1.4-1         
 [25] ggrepel_0.8.2             callr_3.4.3               crayon_1.3.4              RCurl_1.98-1.2            jsonlite_1.6.1            zoo_1.8-8                
 [31] glue_1.4.1                gtable_0.3.0              zlibbioc_1.34.0           XVector_0.28.0            pkgbuild_1.0.8            BiocSingular_1.4.0       
 [37] scales_1.1.1              pscl_1.5.5                pheatmap_1.0.12           DBI_1.1.0                 Rcpp_1.0.4.6              viridisLite_0.3.0        
 [43] xtable_1.8-4              spData_0.3.5              units_0.6-7               spdep_1.1-3               rsvd_1.0.3                proxy_0.4-24             
 [49] htmlwidgets_1.5.1         httr_1.4.1                speedglm_0.3-2            RColorBrewer_1.1-2        ellipsis_0.3.1            farver_2.0.3             
 [55] pkgconfig_2.0.3           uwot_0.1.8                deldir_0.1-25             labeling_0.3              tidyselect_1.1.0          rlang_0.4.6              
 [61] reshape2_1.4.4            later_1.1.0.1             munsell_0.5.0             pbmcapply_1.5.0           tools_4.0.2               cli_2.0.2                
 [67] generics_0.0.2            batchelor_1.4.0           stringr_1.4.0             fastmap_1.0.1             processx_3.4.2            RhpcBLASctl_0.20-137     
 [73] fs_1.4.1                  purrr_0.3.4               RANN_2.6.1                nlme_3.1-148              pbapply_1.4-2             future_1.17.0            
 [79] mime_0.9                  slam_0.1-47               grr_0.9.5                 compiler_4.0.2            rstudioapi_0.11           beeswarm_0.2.3           
 [85] plotly_4.9.2.1            e1071_1.7-3               tibble_3.0.1              stringi_1.4.6             ps_1.3.3                  RSpectra_0.16-0          
 [91] desc_1.2.0                lattice_0.20-41           Matrix_1.2-18             classInt_0.4-3            vctrs_0.3.1               LearnBayes_2.15.1        
 [97] pillar_1.4.4              lifecycle_0.2.0           furrr_0.1.0               lmtest_0.9-37             RcppAnnoy_0.0.16          BiocNeighbors_1.6.0      
[103] data.table_1.12.8         bitops_1.0-6              irlba_2.3.3               Matrix.utils_0.9.8        raster_3.1-5              httpuv_1.5.4             
[109] R6_2.4.1                  promises_1.1.1            KernSmooth_2.23-17        gridExtra_2.3             vipor_0.4.5               sessioninfo_1.1.1        
[115] codetools_0.2-16          gtools_3.8.2              boot_1.3-25               MASS_7.3-51.6             assertthat_0.2.1          pkgload_1.1.0            
[121] leidenbase_0.1.0          rprojroot_1.3-2           withr_2.2.0               GenomeInfoDbData_1.2.3    expm_0.999-4              grid_4.0.2               
[127] coda_0.19-3               tidyr_1.1.0               class_7.3-17              DelayedMatrixStats_1.10.0 Rtsne_0.15                sf_0.9-4                 
[133] shiny_1.4.0.2             ggbeeswarm_0.6.0         

I remain interested in understanding this problem...

I am grateful for your patience and assistance. Thank you.

TJCooperIL commented 4 years ago

After noticing additional instabilities + fatal errors in multiple R packages (in-addition to Monocle3), and despite all attempts to purge R from my system and cleanly reinstall it (the issues persisted), I resorted to a clean install of Ubuntu 20.04. Now everything is working. I can only suggest that there is an issue with the Ubuntu 19.10 -> Ubuntu 20.04 upgrade procedure, but this is outside the scope of Monocle3.

brgew commented 4 years ago

I appreciate the feedback. Thank you.