error in find_gene_modules - Error in RANN

lb15 commented 4 years ago

If this is a question and not a bug report or enhancement request, please post to our google group at https://groups.google.com/forum/#!forum/monocle-3-users

Describe the bug I'm running find_gene_modules to group genes changing over pseudotime into modules. I get an error in RANN::nn2(). I've tried with a full cds and a subsetted cds (subsetted on both cells and genes) but both produce the error. I've also tried specifying monocle3::find_gene_modules() but that is also producing the same error.

To Reproduce

head(sig_genes)
modules=find_gene_modules(cds_unbias[as.character(sig_genes),])
[1] "Gm37381" "Rp1"     "Mrpl15"  "Tcea1"   "Rgs20"   "Atp6v1h"
Error in RANN::nn2(data, data, k + 1, searchtype = "standard") : 
  NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  restarting interrupted promise evaluation
Timing stopped at: 0.002 0.001 0.002

traceback() After the error, run traceback() in R and post the output:

4: RANN::nn2(data, data, k + 1, searchtype = "standard")
3: system.time(tmp <- RANN::nn2(data, data, k + 1, searchtype = "standard"))
2: leiden_clustering(data = reduced_dim_res, pd = rowData(cds)[row.names(reduced_dim_res), 
       , drop = FALSE], k = k, weight = weight, num_iter = leiden_iter, 
       resolution_parameter = resolution, random_seed = random_seed, 
       verbose = verbose, ...)
1: find_gene_modules(cds_unbias[as.character(sig_genes), ])

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

sessionInfo():

R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] RANN_2.6.1                  batchelor_1.4.0            
 [3] reshape2_1.4.4              viridis_0.5.1              
 [5] viridisLite_0.3.0           dplyr_0.8.5                
 [7] Seurat_3.1.5                monocle3_0.2.1.8           
 [9] SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1
[11] DelayedArray_0.14.0         matrixStats_0.56.0         
[13] GenomicRanges_1.40.0        GenomeInfoDb_1.24.0        
[15] IRanges_2.22.1              S4Vectors_0.26.0           
[17] Biobase_2.48.0              BiocGenerics_0.34.0        

loaded via a namespace (and not attached):
  [1] ggbeeswarm_0.6.0          Rtsne_0.15                colorspace_1.4-1         
  [4] ellipsis_0.3.1            ggridges_0.5.2            rprojroot_1.3-2          
  [7] XVector_0.28.0            BiocNeighbors_1.6.0       fs_1.4.1                 
 [10] leiden_0.3.3              listenv_0.8.0             npsurv_0.4-0.1           
 [13] remotes_2.1.1             ggrepel_0.8.2             fansi_0.4.1              
 [16] codetools_0.2-16          splines_4.0.0             lsei_1.2-0.1             
 [19] scater_1.16.0             pkgload_1.0.2             speedglm_0.3-2           
 [22] jsonlite_1.6.1            ica_1.0-2                 cluster_2.1.0            
 [25] png_0.1-7                 uwot_0.1.8                sctransform_0.2.1        
 [28] compiler_4.0.0            httr_1.4.1                backports_1.1.7          
 [31] assertthat_0.2.1          Matrix_1.2-18             lazyeval_0.2.2           
 [34] cli_2.0.2                 BiocSingular_1.4.0        htmltools_0.4.0          
 [37] prettyunits_1.1.1         tools_4.0.0               rsvd_1.0.3               
 [40] igraph_1.2.5              gtable_0.3.0              glue_1.4.1               
 [43] GenomeInfoDbData_1.2.3    Rcpp_1.0.4.6              vctrs_0.3.0              
 [46] ape_5.3                   nlme_3.1-147              DelayedMatrixStats_1.10.0
 [49] lmtest_0.9-37             stringr_1.4.0             globals_0.12.5           
 [52] ps_1.3.3                  testthat_2.3.2            lifecycle_0.2.0          
 [55] irlba_2.3.3               devtools_2.3.0            future_1.17.0            
 [58] zlibbioc_1.34.0           MASS_7.3-51.6             zoo_1.8-8                
 [61] scales_1.1.1              RColorBrewer_1.1-2        curl_4.3                 
 [64] memoise_1.1.0             reticulate_1.15           pbapply_1.4-2            
 [67] gridExtra_2.3             ggplot2_3.3.0             stringi_1.4.6            
 [70] desc_1.2.0                BiocParallel_1.22.0       pkgbuild_1.0.8           
 [73] rlang_0.4.6               pkgconfig_2.0.3           bitops_1.0-6             
 [76] lattice_0.20-41           ROCR_1.0-11               purrr_0.3.4              
 [79] patchwork_1.0.0           htmlwidgets_1.5.1         cowplot_1.0.0            
 [82] processx_3.4.2            tidyselect_1.1.0          RcppAnnoy_0.0.16         
 [85] plyr_1.8.6                magrittr_1.5              R6_2.4.1                 
 [88] pillar_1.4.4              withr_2.2.0               fitdistrplus_1.0-14      
 [91] survival_3.1-12           RCurl_1.98-1.2            tibble_3.0.1             
 [94] future.apply_1.5.0        tsne_0.1-3                crayon_1.3.4             
 [97] utf8_1.1.4                KernSmooth_2.23-17        plotly_4.9.2.1           
[100] usethis_1.6.1             grid_4.0.0                data.table_1.12.8        
[103] callr_3.4.3               digest_0.6.25             tidyr_1.0.3              
[106] munsell_0.5.0             beeswarm_0.2.3            vipor_0.4.5              
[109] sessioninfo_1.1.1

Additional context I'm using the development branch of monocle3. I recently upgraded to R 4.0.0 and the development version of monocle3. Prior to this I did not get the error in find_gene_modules. Thanks!!

lb15 commented 4 years ago

I tried to run find_gene_modules() with a cds and gene list created before I upgraded R/Monocle3, and it worked. So it may be something with how i have created this cds object. One difference is that the cds_unbias contains cells across partitions that I want to analyze. I upgraded to the development monocle3 because of the fix in #275 . In the older version of monocle3, I was manually replacing the partition identifiers with 1 for all cells, to create one partition.

Here is how I built my cds:

exprs<- GetAssayData(seur_ob, slot="counts",assay = "RNA")

## phenodata
pheno.data = seur_ob@meta.data

## feature data
genes <- data.frame(gene_short_name = rownames(exprs))
rownames(genes) <- rownames(exprs)

cds <- new_cell_data_set(exprs,
                         cell_metadata = pheno.data,
                         gene_metadata = genes)

################### PROCESS DATA #########################
set.seed(123)
cds=preprocess_cds(cds, num_dim=20)
cds=align_cds(cds, preprocess_method="PCA",alignment_group="batch")
cds <- reduce_dimension(cds, umap.fast_sgd = FALSE,preprocess_method = "Aligned")
cds <- cluster_cells(cds,cluster_method="leiden",random_seed=123)
cds <- learn_graph(cds,use_partition = F ,learn_graph_control = list(minimal_branch_len=25))
cds=order_cells(cds)
cds_sub = cds[,colData(cds)$Merged_MCC_clusters == "MCC"]
##code to produce vector of genes expressed in at least 10% of cells in any cluster
cds_unbias=cds_sub[pct.above10$genes,]

### Regression analysis
gene_fits <- fit_models(cds_unbias, model_formula_str = "~pseudotime")
fit_coefs <- coefficient_table(gene_fits)
pseudotime_terms <- fit_coefs %>% filter(term == "pseudotime")
pseudotime_sig <- pseudotime_terms %>% filter (q_value < 0.05) %>%
        select(gene_short_name, term, q_value, estimate)
sig_genes = pseudotime_sig$gene_short_name
modules=find_gene_modules(cds_unbias[as.character(sig_genes),])

afaissa commented 4 years ago

Same problem here:

traceback() 4: RANN::nn2(data, data, k + 1, searchtype = "standard") 3: system.time(tmp <- RANN::nn2(data, data, k + 1, searchtype = "standard")) 2: leiden_clustering(data = reduced_dim_res, pd = rowData(cds)[row.names(reduced_dim_res), , drop = FALSE], k = k, weight = weight, num_iter = leiden_iter, resolution_parameter = resolution, random_seed = random_seed, verbose = verbose, ...) 1: find_gene_modules(cds[pr_deg_ids, ], resolution = 0.01)

sessionInfo() R version 4.0.0 (2020-04-24) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)

Matrix products: default

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base

other attached packages: [1] data.table_1.12.8 forcats_0.5.0 stringr_1.4.0
[4] purrr_0.3.4 readr_1.3.1 tidyr_1.0.3
[7] tibble_3.0.1 tidyverse_1.3.0 monocle3_0.2.1.5
[10] SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1 DelayedArray_0.14.0
[13] matrixStats_0.56.0 GenomicRanges_1.40.0 GenomeInfoDb_1.24.0
[16] IRanges_2.22.1 S4Vectors_0.26.0 Biobase_2.48.0
[19] BiocGenerics_0.34.0 ggplot2_3.3.0 Seurat_3.1.5
[22] dplyr_0.8.5

loaded via a namespace (and not attached): [1] readxl_1.3.1 backports_1.1.6 plyr_1.8.6
[4] igraph_1.2.5 lazyeval_0.2.2 sp_1.4-1
[7] splines_4.0.0 listenv_0.8.0 digest_0.6.25
[10] htmltools_0.4.0 viridis_0.5.1 gdata_2.18.0
[13] fansi_0.4.1 magrittr_1.5 cluster_2.1.0
[16] ROCR_1.0-11 limma_3.44.1 globals_0.12.5
[19] modelr_0.1.7 gmodels_2.18.1 colorspace_1.4-1
[22] rvest_0.3.5 rappdirs_0.3.1 ggrepel_0.8.2
[25] haven_2.2.0 crayon_1.3.4 RCurl_1.98-1.2
[28] jsonlite_1.6.1 survival_3.1-12 zoo_1.8-8
[31] ape_5.3 glue_1.4.0 gtable_0.3.0
[34] zlibbioc_1.34.0 XVector_0.28.0 leiden_0.3.3
[37] future.apply_1.5.0 scales_1.1.1 pheatmap_1.0.12
[40] DBI_1.1.0 Rcpp_1.0.4.6 viridisLite_0.3.0
[43] spData_0.3.5 units_0.6-6 reticulate_1.15
[46] spdep_1.1-3 rsvd_1.0.3 proxy_0.4-24
[49] tsne_0.1-3 htmlwidgets_1.5.1 httr_1.4.1
[52] RColorBrewer_1.1-2 speedglm_0.3-2 ellipsis_0.3.0
[55] ica_1.0-2 pkgconfig_2.0.3 farver_2.0.3
[58] deldir_0.1-25 uwot_0.1.8 dbplyr_1.4.3
[61] utf8_1.1.4 tidyselect_1.1.0 labeling_0.3
[64] rlang_0.4.6 reshape2_1.4.4 pbmcapply_1.5.0
[67] munsell_0.5.0 cellranger_1.1.0 tools_4.0.0
[70] cli_2.0.2 generics_0.0.2 broom_0.5.6
[73] ggridges_0.5.2 npsurv_0.4-0.1 RhpcBLASctl_0.20-17
[76] fs_1.4.1 fitdistrplus_1.0-14 RANN_2.6.1
[79] pbapply_1.4-2 future_1.17.0 nlme_3.1-147
[82] xml2_1.3.2 compiler_4.0.0 rstudioapi_0.11
[85] plotly_4.9.2.1 png_0.1-7 e1071_1.7-3
[88] lsei_1.2-0.1 reprex_0.3.0 stringi_1.4.6
[91] RSpectra_0.16-0 lattice_0.20-41 Matrix_1.2-18
[94] classInt_0.4-3 vctrs_0.3.0 LearnBayes_2.15.1
[97] pillar_1.4.4 lifecycle_0.2.0 lmtest_0.9-37
[100] RcppAnnoy_0.0.16 cowplot_1.0.0 bitops_1.0-6
[103] irlba_2.3.3 raster_3.1-5 patchwork_1.0.0
[106] R6_2.4.1 KernSmooth_2.23-16 gridExtra_2.3
[109] codetools_0.2-16 gtools_3.8.2 boot_1.3-24
[112] MASS_7.3-51.5 assertthat_0.2.1 leidenbase_0.1.0
[115] withr_2.2.0 sctransform_0.2.1 GenomeInfoDbData_1.2.3
[118] expm_0.999-4 hms_0.5.3 grid_4.0.0
[121] coda_0.19-3 class_7.3-16 DelayedMatrixStats_1.10.0 [124] Rtsne_0.15 sf_0.9-3 lubridate_1.7.8

brgew commented 4 years ago

Hi lb15,

I tried to reproduce the issue using the example on the Monocle3 documentation web site but it worked. Looking at the find_genes_modules() function call in your second message

sig_genes = pseudotime_sig$gene_short_name
modules=find_gene_modules(cds_unbias[as.character(sig_genes),])

it appears to me that you are subsetting the cds_unbias rows using the gene short names; however, I think that cds_unbias[as.character(sig_genes),] requires the gene ids rather than short names. I think that the call

modules<-find_gene_modules(cds[rowData(cds)$gene_short_name %in% sig_genes,])

may work for you. Can you let me know if the problem persists?

afaissa commented 4 years ago

Hi brgew,

I tried to apply the suggestion you gave to lb15 in my data, but still having the same issue. Is there anything else I can try?

Thank you!

gene_fits <- fit_models(cds, model_formula_str = "~pseudotime") fit_coefs <- coefficient_table(gene_fits) pseudotime_terms <- fit_coefs %>% filter(term == "pseudotime") pseudotime_sig <- pseudotime_terms %>% filter (q_value < 0.05) %>% select(gene_short_name, term, q_value, estimate) sig_genes = pseudotime_sig$gene_short_name modules<-find_gene_modules(cds[rowData(cds)$gene_short_name %in% sig_genes,])

Error in RANN::nn2(data, data, k + 1, searchtype = "standard") : NA/NaN/Inf in foreign function call (arg 1) Timing stopped at: 0.001 0 0.001

traceback() 4: RANN::nn2(data, data, k + 1, searchtype = "standard") 3: system.time(tmp <- RANN::nn2(data, data, k + 1, searchtype = "standard")) 2: leiden_clustering(data = reduced_dim_res, pd = rowData(cds)[row.names(reduced_dim_res), , drop = FALSE], k = k, weight = weight, louvain_iter = louvain_iter, resolution_parameter = resolution, random_seed = random_seed, verbose = verbose, ...) 1: find_gene_modules(cds[rowData(cds)$gene_short_name %in% sig_genes, ])

sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora 30 (Workstation Edition)

Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] monocle3_0.2.1 SingleCellExperiment_1.8.0 [3] SummarizedExperiment_1.16.1 DelayedArray_0.12.3
[5] BiocParallel_1.20.1 matrixStats_0.56.0
[7] GenomicRanges_1.38.0 GenomeInfoDb_1.22.1
[9] IRanges_2.20.2 S4Vectors_0.24.4
[11] Biobase_2.46.0 BiocGenerics_0.32.0
[13] ggplot2_3.3.0 Seurat_3.1.2
[15] dplyr_0.8.5

loaded via a namespace (and not attached): [1] TH.data_1.0-10 Rtsne_0.15 colorspace_1.4-1
[4] ellipsis_0.3.0 ggridges_0.5.2 XVector_0.26.0
[7] leiden_0.3.1 listenv_0.8.0 npsurv_0.4-0
[10] ggrepel_0.8.1 mvtnorm_1.0-12 codetools_0.2-16
[13] splines_3.6.3 R.methodsS3_1.7.1 mnormt_1.5-5
[16] lsei_1.2-0 TFisher_0.2.0 speedglm_0.3-2
[19] jsonlite_1.6.1 ica_1.0-2 cluster_2.1.0
[22] png_0.1-7 R.oo_1.23.0 uwot_0.1.5
[25] sctransform_0.2.1 compiler_3.6.3 httr_1.4.1
[28] assertthat_0.2.1 Matrix_1.2-18 lazyeval_0.2.2
[31] htmltools_0.4.0 tools_3.6.3 rsvd_1.0.2
[34] igraph_1.2.4.2 gtable_0.3.0 glue_1.4.0
[37] GenomeInfoDbData_1.2.2 RANN_2.6.1 reshape2_1.4.3
[40] rappdirs_0.3.1 Rcpp_1.0.4.6 vctrs_0.3.0
[43] multtest_2.42.0 gdata_2.18.0 ape_5.3
[46] nlme_3.1-144 gbRd_0.4-11 lmtest_0.9-37
[49] stringr_1.4.0 globals_0.12.5 lifecycle_0.2.0
[52] irlba_2.3.3 gtools_3.8.1 future_1.16.0
[55] zlibbioc_1.32.0 MASS_7.3-51.5 zoo_1.8-7
[58] scales_1.1.1 sandwich_2.5-1 RColorBrewer_1.1-2
[61] reticulate_1.14 pbapply_1.4-2 gridExtra_2.3
[64] stringi_1.4.6 mutoss_0.1-12 plotrix_3.7-7
[67] caTools_1.18.0 bibtex_0.4.2.2 Rdpack_0.11-1
[70] SDMTools_1.1-221.2 rlang_0.4.6 pkgconfig_2.0.3
[73] bitops_1.0-6 lattice_0.20-38 ROCR_1.0-7
[76] purrr_0.3.4 htmlwidgets_1.5.1 cowplot_1.0.0
[79] tidyselect_1.1.0 RcppAnnoy_0.0.15 plyr_1.8.5
[82] magrittr_1.5 R6_2.4.1 gplots_3.0.1.2
[85] multcomp_1.4-12 withr_2.2.0 pillar_1.4.4
[88] sn_1.5-4 fitdistrplus_1.0-14 survival_3.1-8
[91] RCurl_1.98-1.2 tsne_0.1-3 tibble_3.0.1
[94] future.apply_1.4.0 crayon_1.3.4 KernSmooth_2.23-16
[97] plotly_4.9.2 viridis_0.5.1 grid_3.6.3
[100] data.table_1.12.8 metap_1.2 digest_0.6.25
[103] tidyr_1.0.3 numDeriv_2016.8-1.1 R.utils_2.9.2
[106] RcppParallel_4.4.4 munsell_0.5.0 viridisLite_0.3.0

lb15 commented 4 years ago

Hi brgrew,

Thanks for your help. I tried your recommendation but I'm getting the same error.

I also checked the gene_short_name and think that the subsetting will give me an identical object.

> test=cds_unbias[sig_genes,]
> identical(test, cds_unbias[rowData(cds_unbias)$gene_short_name %in% sig_genes,])
[1] TRUE
> sum(!sig_genes %in% rowData(cds_unbias)$gene_short_name)
[1] 0

I have been able to solve this by replacing the partitions (I have 3 partitions) with 1.

## make everything one partition
cds=cds_unbias
cds@clusters$UMAP$partitions[cds@clusters$UMAP$partitions == "2"] <- "1" 
cds@clusters$UMAP$partitions[cds@clusters$UMAP$partitions == "3"] <- "1"

I then run learn_graph, order_cells, DE analysis, and then find_gene_modules worked.

afaissa commented 4 years ago

Hi brgrew,

I was not able to fix the issue with lb15' suggestion replacing the partition.

The error occurs when it is used the residual_model_formula_str with align_cds.

I was able to reproduce the issue using the example on the Monocle3 documentation web site.

library(monocle3) library(dplyr)

expression_matrix <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_expression.rds")) cell_metadata <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_colData.rds")) gene_annotation <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_rowData.rds"))

cds <- new_cell_data_set(expression_matrix, cell_metadata = cell_metadata, gene_metadata = gene_annotation)

cds <- preprocess_cds(cds, num_dim = 45) cds <- align_cds(cds, residual_model_formula_str = "~Size_Factor") cds <- reduce_dimension(cds)

cds = cluster_cells(cds, resolution=1e-5) levels(cds@clusters$UMAP$partitions) levels(cds@clusters$UMAP$clusters)

pr_graph_test_res <- graph_test(cds, neighbor_graph="knn", cores=18) pr_deg_ids <- row.names(subset(pr_graph_test_res, morans_I > 0.01 & q_value < 0.05)) gene_module_df <- find_gene_modules(cds[pr_deg_ids,], cores=18)

Error in RANN::nn2(data, data, k + 1, searchtype = "standard") : NA/NaN/Inf in foreign function call (arg 1) Timing stopped at: 0.001 0 0.001

brgew commented 4 years ago

Hi @afaissa, Thank you for posting the example. It is very helpful!

afaissa commented 4 years ago

Thank you for all the amazing work on the package. Please, let me know if I can try again. I am waiting for this to apply for 3 of my data sets. Please, let me know if I can help.

brgew commented 4 years ago

Hi @afaissa, I believe that this is fixed in the develop branch. Please let me know if you find otherwise. Thank you!

afaissa commented 4 years ago

Solved! Thank you very much!

brgew commented 4 years ago

I appreciate the feedback. Thank you!

cole-trapnell-lab / monocle3

error in find_gene_modules - Error in RANN #346