NathanSkene / EWCE

Expression Weighted Celltype Enrichment. See the package website for up-to-date instructions on usage.
https://nathanskene.github.io/EWCE/index.html
53 stars 25 forks source link

Enrichment test in version 1.7.1 #78

Open forrestzhaosen opened 1 year ago

forrestzhaosen commented 1 year ago

1. Bug description

Thank you for updating to provide the bootstrap probability for each gene. However, I met an error that was not seen in 1.6.0 when I performed EWCE::bootstrap_enrichment_test()

Console output

# Paste console output here (e.g. from R/python/command line)

 > full_results <- EWCE::bootstrap_enrichment_test(sct_data = our_ctd_data_human,
+                                                 sctSpecies = "human",
+                                                 genelistSpecies = "human",
+                                                 output_species = "human",
+                                                 method = "homologene",
+                                                 hits = hits,
+                                                 reps = 100,
+                                                 annotLevel = 1)
1 core(s) assigned as workers (7 reserved).
Retrieving all genes using: homologene.
Retrieving all organisms available in homologene.
Mapping species name: human
Common name mapping found for human
1 organism identified from search: 9606
Gene table with 19,129 rows retrieved.
Returning all 19,129 genes from human.
Returning 19,129 unique genes from entire human genome.
Standardising CellTypeDataset
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 20 )
Checking gene list inputs.
Returning 19,129 unique genes from the user-supplied bg.
Standardising sct_data.
Converting gene list input to standardised human genes.
Running without gene size control.
733 hit genes remain after filtering.
Computing gene scores.
Using previously sampled genes.
Error in exp_mats[[cc]][s, ] <- sort(expD[, cc]) : 
  replacement has length zero

### Expected behaviour   

(A clear and concise description of what you expected to happen.)
There was no error in 1.6.0 version

## 2. Reproducible example   

### Code 
our_ctd = EWCE::generate_celltype_data(exp=human_spine_rds@assays$RNA@counts,annotLevels=list(cellType=human_spine_rds$cell_type2),groupName="human_RDS")
our_ctd_data_human = load_rdata(our_ctd)
risk_genes = read.table(genetic_path)
hits = as.vector(unlist(risk_genes))
print(hits)

reps <- 1500
annotLevel <- 1

full_results <- EWCE::bootstrap_enrichment_test(sct_data = our_ctd_data_human,
                                                sctSpecies = "human",
                                                genelistSpecies = "human",
                                                output_species = "human",
                                                method = "homologene",
                                                hits = hits,
                                                reps = 100,
                                                annotLevel = 1)

```R
# Paste example here

Data

This is how my ctd data looks like:

image

3. Session info

(Add output of the R function utils::sessionInfo() below. This helps us assess version/OS conflicts which could be causing bugs.)

``` # Paste utils::sessionInfo() output R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS 13.1 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ewceData_1.7.1 ExperimentHub_2.6.0 AnnotationHub_3.6.0 BiocFileCache_2.6.1 dbplyr_2.3.1 [6] BiocGenerics_0.44.0 readr_2.1.4 stringr_1.5.0 magrittr_2.0.3 EWCE_1.7.2 [11] RNOmni_1.0.1 Signac_1.9.0 ggplot2_3.4.1 SeuratObject_4.1.3 Seurat_4.3.0 loaded via a namespace (and not attached): [1] utf8_1.2.3 spatstat.explore_3.0-6 reticulate_1.28 [4] tidyselect_1.2.0 RSQLite_2.3.0 AnnotationDbi_1.60.1 [7] htmlwidgets_1.6.1 BiocParallel_1.32.5 grid_4.2.0 [10] Rtsne_0.16 munsell_0.5.0 codetools_0.2-19 [13] ica_1.0-3 future_1.32.0 miniUI_0.1.1.1 [16] withr_2.5.0 spatstat.random_3.1-3 colorspace_2.1-0 [19] progressr_0.13.0 Biobase_2.58.0 filelock_1.0.2 [22] rstudioapi_0.14 orthogene_1.4.1 stats4_4.2.0 [25] SingleCellExperiment_1.20.0 ROCR_1.0-11 DescTools_0.99.48 [28] ggsignif_0.6.4 tensor_1.5 listenv_0.9.0 [31] labeling_0.4.2 MatrixGenerics_1.10.0 GenomeInfoDbData_1.2.9 [34] polyclip_1.10-4 farver_2.1.1 bit64_4.0.5 [37] rprojroot_2.0.3 parallelly_1.34.0 vctrs_0.5.2 [40] treeio_1.22.0 generics_0.1.3 scKirby_0.1.1 [43] R6_2.5.1 GenomeInfoDb_1.34.9 bitops_1.0-7 [46] spatstat.utils_3.0-1 cachem_1.0.7 gridGraphics_0.5-1 [49] DelayedArray_0.24.0 promises_1.2.0.1 scales_1.2.1 [52] rootSolve_1.8.2.3 gtable_0.3.1 globals_0.16.2 [55] processx_3.8.0 goftest_1.2-3 lmom_2.9 [58] rlang_1.0.6 RcppRoll_0.3.0 splines_4.2.0 [61] rstatix_0.7.2 lazyeval_0.2.2 spatstat.geom_3.0-6 [64] broom_1.0.3 BiocManager_1.30.20 yaml_2.3.7 [67] reshape2_1.4.4 abind_1.4-5 backports_1.4.1 [70] httpuv_1.6.9 tools_4.2.0 ggplotify_0.1.0 [73] ellipsis_0.3.2 RColorBrewer_1.1-3 proxy_0.4-27 [76] ggdendro_0.1.23 ggridges_0.5.4 Rcpp_1.0.10 [79] plyr_1.8.8 zlibbioc_1.44.0 purrr_1.0.1 [82] RCurl_1.98-1.10 prettyunits_1.1.1 ps_1.7.2 [85] ggpubr_0.6.0 deldir_1.0-6 pbapply_1.7-0 [88] cowplot_1.1.1 S4Vectors_0.36.2 zoo_1.8-11 [91] SummarizedExperiment_1.28.0 grr_0.9.5 ggrepel_0.9.3 [94] cluster_2.1.4 data.table_1.14.8 scattermore_0.8 [97] lmtest_0.9-40 RANN_2.6.1 mvtnorm_1.1-3 [100] fitdistrplus_1.1-8 matrixStats_0.63.0 hms_1.1.2 [103] patchwork_1.1.2 mime_0.12 xtable_1.8-4 [106] readxl_1.4.2 IRanges_2.32.0 gridExtra_2.3 [109] compiler_4.2.0 tibble_3.2.0 KernSmooth_2.23-20 [112] crayon_1.5.2 htmltools_0.5.4 tzdb_0.3.0 [115] ggfun_0.0.9 later_1.3.0 tidyr_1.3.0 [118] aplot_0.1.10 expm_0.999-7 Exact_3.2 [121] DBI_1.1.3 gprofiler2_0.2.1 MASS_7.3-58.3 [124] rappdirs_0.3.3 boot_1.3-28.1 babelgene_22.9 [127] Matrix_1.5-3 car_3.1-1 cli_3.6.0 [130] parallel_4.2.0 igraph_1.4.1 GenomicRanges_1.50.2 [133] pkgconfig_2.0.3 sp_1.6-0 plotly_4.10.1 [136] spatstat.sparse_3.0-0 ggtree_3.6.2 XVector_0.38.0 [139] yulab.utils_0.0.6 callr_3.7.3 digest_0.6.31 [142] sctransform_0.3.5 RcppAnnoy_0.0.20 spatstat.data_3.0-0 [145] Biostrings_2.66.0 cellranger_1.1.0 fastmatch_1.1-3 [148] HGNChelper_0.8.1 leiden_0.4.3 tidytree_0.4.2 [151] gld_2.6.6 uwot_0.1.14 curl_5.0.0 [154] Rsamtools_2.14.0 shiny_1.7.4 lifecycle_1.0.3 [157] nlme_3.1-162 jsonlite_1.8.4 carData_3.0-5 [160] desc_1.4.2 viridisLite_0.4.1 limma_3.54.2 [163] fansi_1.0.4 pillar_1.8.1 lattice_0.20-45 [166] homologene_1.4.68.19.3.27 pkgbuild_1.4.0 KEGGREST_1.38.0 [169] fastmap_1.1.1 httr_1.4.5 survival_3.5-3 [172] remotes_2.4.2 interactiveDisplayBase_1.36.0 glue_1.6.2 [175] png_0.1-8 BiocVersion_3.16.0 bit_4.0.5 [178] class_7.3-21 stringi_1.7.12 blob_1.2.3 [181] memoise_2.0.1 dplyr_1.1.0 e1071_1.7-13 [184] irlba_2.3.5.1 future.apply_1.10.0 ape_5.7
bschilder commented 1 year ago

Hi @forrestzhaosen what format are the genes in your CTD?

Can you show me:

head(our_ctd_data_human[[1]]$mean_exp)
forrestzhaosen commented 1 year ago

Sure. Here it is: image