Closed lb15 closed 4 years ago
I tried to run find_gene_modules() with a cds and gene list created before I upgraded R/Monocle3, and it worked. So it may be something with how i have created this cds object. One difference is that the cds_unbias contains cells across partitions that I want to analyze. I upgraded to the development monocle3 because of the fix in #275 . In the older version of monocle3, I was manually replacing the partition identifiers with 1 for all cells, to create one partition.
Here is how I built my cds:
exprs<- GetAssayData(seur_ob, slot="counts",assay = "RNA")
## phenodata
pheno.data = seur_ob@meta.data
## feature data
genes <- data.frame(gene_short_name = rownames(exprs))
rownames(genes) <- rownames(exprs)
cds <- new_cell_data_set(exprs,
cell_metadata = pheno.data,
gene_metadata = genes)
################### PROCESS DATA #########################
set.seed(123)
cds=preprocess_cds(cds, num_dim=20)
cds=align_cds(cds, preprocess_method="PCA",alignment_group="batch")
cds <- reduce_dimension(cds, umap.fast_sgd = FALSE,preprocess_method = "Aligned")
cds <- cluster_cells(cds,cluster_method="leiden",random_seed=123)
cds <- learn_graph(cds,use_partition = F ,learn_graph_control = list(minimal_branch_len=25))
cds=order_cells(cds)
cds_sub = cds[,colData(cds)$Merged_MCC_clusters == "MCC"]
##code to produce vector of genes expressed in at least 10% of cells in any cluster
cds_unbias=cds_sub[pct.above10$genes,]
### Regression analysis
gene_fits <- fit_models(cds_unbias, model_formula_str = "~pseudotime")
fit_coefs <- coefficient_table(gene_fits)
pseudotime_terms <- fit_coefs %>% filter(term == "pseudotime")
pseudotime_sig <- pseudotime_terms %>% filter (q_value < 0.05) %>%
select(gene_short_name, term, q_value, estimate)
sig_genes = pseudotime_sig$gene_short_name
modules=find_gene_modules(cds_unbias[as.character(sig_genes),])
Same problem here:
traceback() 4: RANN::nn2(data, data, k + 1, searchtype = "standard") 3: system.time(tmp <- RANN::nn2(data, data, k + 1, searchtype = "standard")) 2: leiden_clustering(data = reduced_dim_res, pd = rowData(cds)[row.names(reduced_dim_res), , drop = FALSE], k = k, weight = weight, num_iter = leiden_iter, resolution_parameter = resolution, random_seed = random_seed, verbose = verbose, ...) 1: find_gene_modules(cds[pr_deg_ids, ], resolution = 0.01)
sessionInfo() R version 4.0.0 (2020-04-24) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)
Matrix products: default
Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] data.table_1.12.8 forcats_0.5.0 stringr_1.4.0
[4] purrr_0.3.4 readr_1.3.1 tidyr_1.0.3
[7] tibble_3.0.1 tidyverse_1.3.0 monocle3_0.2.1.5
[10] SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1 DelayedArray_0.14.0
[13] matrixStats_0.56.0 GenomicRanges_1.40.0 GenomeInfoDb_1.24.0
[16] IRanges_2.22.1 S4Vectors_0.26.0 Biobase_2.48.0
[19] BiocGenerics_0.34.0 ggplot2_3.3.0 Seurat_3.1.5
[22] dplyr_0.8.5
loaded via a namespace (and not attached):
[1] readxl_1.3.1 backports_1.1.6 plyr_1.8.6
[4] igraph_1.2.5 lazyeval_0.2.2 sp_1.4-1
[7] splines_4.0.0 listenv_0.8.0 digest_0.6.25
[10] htmltools_0.4.0 viridis_0.5.1 gdata_2.18.0
[13] fansi_0.4.1 magrittr_1.5 cluster_2.1.0
[16] ROCR_1.0-11 limma_3.44.1 globals_0.12.5
[19] modelr_0.1.7 gmodels_2.18.1 colorspace_1.4-1
[22] rvest_0.3.5 rappdirs_0.3.1 ggrepel_0.8.2
[25] haven_2.2.0 crayon_1.3.4 RCurl_1.98-1.2
[28] jsonlite_1.6.1 survival_3.1-12 zoo_1.8-8
[31] ape_5.3 glue_1.4.0 gtable_0.3.0
[34] zlibbioc_1.34.0 XVector_0.28.0 leiden_0.3.3
[37] future.apply_1.5.0 scales_1.1.1 pheatmap_1.0.12
[40] DBI_1.1.0 Rcpp_1.0.4.6 viridisLite_0.3.0
[43] spData_0.3.5 units_0.6-6 reticulate_1.15
[46] spdep_1.1-3 rsvd_1.0.3 proxy_0.4-24
[49] tsne_0.1-3 htmlwidgets_1.5.1 httr_1.4.1
[52] RColorBrewer_1.1-2 speedglm_0.3-2 ellipsis_0.3.0
[55] ica_1.0-2 pkgconfig_2.0.3 farver_2.0.3
[58] deldir_0.1-25 uwot_0.1.8 dbplyr_1.4.3
[61] utf8_1.1.4 tidyselect_1.1.0 labeling_0.3
[64] rlang_0.4.6 reshape2_1.4.4 pbmcapply_1.5.0
[67] munsell_0.5.0 cellranger_1.1.0 tools_4.0.0
[70] cli_2.0.2 generics_0.0.2 broom_0.5.6
[73] ggridges_0.5.2 npsurv_0.4-0.1 RhpcBLASctl_0.20-17
[76] fs_1.4.1 fitdistrplus_1.0-14 RANN_2.6.1
[79] pbapply_1.4-2 future_1.17.0 nlme_3.1-147
[82] xml2_1.3.2 compiler_4.0.0 rstudioapi_0.11
[85] plotly_4.9.2.1 png_0.1-7 e1071_1.7-3
[88] lsei_1.2-0.1 reprex_0.3.0 stringi_1.4.6
[91] RSpectra_0.16-0 lattice_0.20-41 Matrix_1.2-18
[94] classInt_0.4-3 vctrs_0.3.0 LearnBayes_2.15.1
[97] pillar_1.4.4 lifecycle_0.2.0 lmtest_0.9-37
[100] RcppAnnoy_0.0.16 cowplot_1.0.0 bitops_1.0-6
[103] irlba_2.3.3 raster_3.1-5 patchwork_1.0.0
[106] R6_2.4.1 KernSmooth_2.23-16 gridExtra_2.3
[109] codetools_0.2-16 gtools_3.8.2 boot_1.3-24
[112] MASS_7.3-51.5 assertthat_0.2.1 leidenbase_0.1.0
[115] withr_2.2.0 sctransform_0.2.1 GenomeInfoDbData_1.2.3
[118] expm_0.999-4 hms_0.5.3 grid_4.0.0
[121] coda_0.19-3 class_7.3-16 DelayedMatrixStats_1.10.0
[124] Rtsne_0.15 sf_0.9-3 lubridate_1.7.8
Hi lb15,
I tried to reproduce the issue using the example on the Monocle3 documentation web site but it worked. Looking at the find_genes_modules() function call in your second message
sig_genes = pseudotime_sig$gene_short_name
modules=find_gene_modules(cds_unbias[as.character(sig_genes),])
it appears to me that you are subsetting the cds_unbias rows using the gene short names; however, I think that cds_unbias[as.character(sig_genes),] requires the gene ids rather than short names. I think that the call
modules<-find_gene_modules(cds[rowData(cds)$gene_short_name %in% sig_genes,])
may work for you. Can you let me know if the problem persists?
Hi brgew,
I tried to apply the suggestion you gave to lb15 in my data, but still having the same issue. Is there anything else I can try?
Thank you!
gene_fits <- fit_models(cds, model_formula_str = "~pseudotime")
fit_coefs <- coefficient_table(gene_fits)
pseudotime_terms <- fit_coefs %>% filter(term == "pseudotime")
pseudotime_sig <- pseudotime_terms %>% filter (q_value < 0.05) %>% select(gene_short_name, term, q_value, estimate)
sig_genes = pseudotime_sig$gene_short_name
modules<-find_gene_modules(cds[rowData(cds)$gene_short_name %in% sig_genes,])
Error in RANN::nn2(data, data, k + 1, searchtype = "standard") : NA/NaN/Inf in foreign function call (arg 1) Timing stopped at: 0.001 0 0.001
traceback() 4: RANN::nn2(data, data, k + 1, searchtype = "standard") 3: system.time(tmp <- RANN::nn2(data, data, k + 1, searchtype = "standard")) 2: leiden_clustering(data = reduced_dim_res, pd = rowData(cds)[row.names(reduced_dim_res), , drop = FALSE], k = k, weight = weight, louvain_iter = louvain_iter, resolution_parameter = resolution, random_seed = random_seed, verbose = verbose, ...) 1: find_gene_modules(cds[rowData(cds)$gene_short_name %in% sig_genes, ])
sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora 30 (Workstation Edition)
Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base
other attached packages:
[1] monocle3_0.2.1 SingleCellExperiment_1.8.0
[3] SummarizedExperiment_1.16.1 DelayedArray_0.12.3
[5] BiocParallel_1.20.1 matrixStats_0.56.0
[7] GenomicRanges_1.38.0 GenomeInfoDb_1.22.1
[9] IRanges_2.20.2 S4Vectors_0.24.4
[11] Biobase_2.46.0 BiocGenerics_0.32.0
[13] ggplot2_3.3.0 Seurat_3.1.2
[15] dplyr_0.8.5
loaded via a namespace (and not attached):
[1] TH.data_1.0-10 Rtsne_0.15 colorspace_1.4-1
[4] ellipsis_0.3.0 ggridges_0.5.2 XVector_0.26.0
[7] leiden_0.3.1 listenv_0.8.0 npsurv_0.4-0
[10] ggrepel_0.8.1 mvtnorm_1.0-12 codetools_0.2-16
[13] splines_3.6.3 R.methodsS3_1.7.1 mnormt_1.5-5
[16] lsei_1.2-0 TFisher_0.2.0 speedglm_0.3-2
[19] jsonlite_1.6.1 ica_1.0-2 cluster_2.1.0
[22] png_0.1-7 R.oo_1.23.0 uwot_0.1.5
[25] sctransform_0.2.1 compiler_3.6.3 httr_1.4.1
[28] assertthat_0.2.1 Matrix_1.2-18 lazyeval_0.2.2
[31] htmltools_0.4.0 tools_3.6.3 rsvd_1.0.2
[34] igraph_1.2.4.2 gtable_0.3.0 glue_1.4.0
[37] GenomeInfoDbData_1.2.2 RANN_2.6.1 reshape2_1.4.3
[40] rappdirs_0.3.1 Rcpp_1.0.4.6 vctrs_0.3.0
[43] multtest_2.42.0 gdata_2.18.0 ape_5.3
[46] nlme_3.1-144 gbRd_0.4-11 lmtest_0.9-37
[49] stringr_1.4.0 globals_0.12.5 lifecycle_0.2.0
[52] irlba_2.3.3 gtools_3.8.1 future_1.16.0
[55] zlibbioc_1.32.0 MASS_7.3-51.5 zoo_1.8-7
[58] scales_1.1.1 sandwich_2.5-1 RColorBrewer_1.1-2
[61] reticulate_1.14 pbapply_1.4-2 gridExtra_2.3
[64] stringi_1.4.6 mutoss_0.1-12 plotrix_3.7-7
[67] caTools_1.18.0 bibtex_0.4.2.2 Rdpack_0.11-1
[70] SDMTools_1.1-221.2 rlang_0.4.6 pkgconfig_2.0.3
[73] bitops_1.0-6 lattice_0.20-38 ROCR_1.0-7
[76] purrr_0.3.4 htmlwidgets_1.5.1 cowplot_1.0.0
[79] tidyselect_1.1.0 RcppAnnoy_0.0.15 plyr_1.8.5
[82] magrittr_1.5 R6_2.4.1 gplots_3.0.1.2
[85] multcomp_1.4-12 withr_2.2.0 pillar_1.4.4
[88] sn_1.5-4 fitdistrplus_1.0-14 survival_3.1-8
[91] RCurl_1.98-1.2 tsne_0.1-3 tibble_3.0.1
[94] future.apply_1.4.0 crayon_1.3.4 KernSmooth_2.23-16
[97] plotly_4.9.2 viridis_0.5.1 grid_3.6.3
[100] data.table_1.12.8 metap_1.2 digest_0.6.25
[103] tidyr_1.0.3 numDeriv_2016.8-1.1 R.utils_2.9.2
[106] RcppParallel_4.4.4 munsell_0.5.0 viridisLite_0.3.0
Hi brgrew,
Thanks for your help. I tried your recommendation but I'm getting the same error.
I also checked the gene_short_name and think that the subsetting will give me an identical object.
> test=cds_unbias[sig_genes,]
> identical(test, cds_unbias[rowData(cds_unbias)$gene_short_name %in% sig_genes,])
[1] TRUE
> sum(!sig_genes %in% rowData(cds_unbias)$gene_short_name)
[1] 0
I have been able to solve this by replacing the partitions (I have 3 partitions) with 1.
## make everything one partition
cds=cds_unbias
cds@clusters$UMAP$partitions[cds@clusters$UMAP$partitions == "2"] <- "1"
cds@clusters$UMAP$partitions[cds@clusters$UMAP$partitions == "3"] <- "1"
I then run learn_graph, order_cells, DE analysis, and then find_gene_modules worked.
Hi brgrew,
I was not able to fix the issue with lb15' suggestion replacing the partition.
The error occurs when it is used the residual_model_formula_str with align_cds.
I was able to reproduce the issue using the example on the Monocle3 documentation web site.
library(monocle3)
library(dplyr)
expression_matrix <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_expression.rds"))
cell_metadata <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_colData.rds"))
gene_annotation <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_rowData.rds"))
cds <- new_cell_data_set(expression_matrix, cell_metadata = cell_metadata, gene_metadata = gene_annotation)
cds <- preprocess_cds(cds, num_dim = 45)
cds <- align_cds(cds, residual_model_formula_str = "~Size_Factor")
cds <- reduce_dimension(cds)
cds = cluster_cells(cds, resolution=1e-5)
levels(cds@clusters$UMAP$partitions)
levels(cds@clusters$UMAP$clusters)
pr_graph_test_res <- graph_test(cds, neighbor_graph="knn", cores=18)
pr_deg_ids <- row.names(subset(pr_graph_test_res, morans_I > 0.01 & q_value < 0.05))
gene_module_df <- find_gene_modules(cds[pr_deg_ids,], cores=18)
Error in RANN::nn2(data, data, k + 1, searchtype = "standard") : NA/NaN/Inf in foreign function call (arg 1) Timing stopped at: 0.001 0 0.001
Hi @afaissa, Thank you for posting the example. It is very helpful!
Thank you for all the amazing work on the package. Please, let me know if I can try again. I am waiting for this to apply for 3 of my data sets. Please, let me know if I can help.
Hi @afaissa, I believe that this is fixed in the develop branch. Please let me know if you find otherwise. Thank you!
Solved! Thank you very much!
I appreciate the feedback. Thank you!
If this is a question and not a bug report or enhancement request, please post to our google group at https://groups.google.com/forum/#!forum/monocle-3-users
Describe the bug I'm running find_gene_modules to group genes changing over pseudotime into modules. I get an error in RANN::nn2(). I've tried with a full cds and a subsetted cds (subsetted on both cells and genes) but both produce the error. I've also tried specifying monocle3::find_gene_modules() but that is also producing the same error.
To Reproduce
traceback() After the error, run traceback() in R and post the output:
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
sessionInfo():
Additional context I'm using the development branch of monocle3. I recently upgraded to R 4.0.0 and the development version of monocle3. Prior to this I did not get the error in find_gene_modules. Thanks!!