NathanSkene / EWCE

Expression Weighted Celltype Enrichment. See the package website for up-to-date instructions on usage.
https://nathanskene.github.io/EWCE/index.html
53 stars 25 forks source link

Error with generate_bootstrap_plots #66

Closed h1hui closed 2 years ago

h1hui commented 2 years ago

1. Bug description

When I ran generate_bootstrap_plots on my dataset it caused an error.

Console output

Retrieving all genes using: homologene.
Retrieving all organisms available in homologene.
Mapping species name: human
Common name mapping found for human
1 organism identified from search: 9606
Gene table with 19,129 rows retrieved.
Returning all 19,129 genes from human.
Standardising sct_data.
Converting to sparse matrix.
Converting to sparse matrix.
Aligning celltype names with standardise_ctd format.
Error in exp_mats[[cc]][s, ] <- sort(expD[, cc]) : 
  number of items to replace is not a multiple of replacement length

Expected behaviour

All the previous steps ran through without a problem, I expect this should make a plot.

2. Reproducible example

Code

> plot_file_path <- EWCE::generate_bootstrap_plots(
+     sct_data = ctd,
+     hits = hits,
+     genelistSpecies = "human",
+     sctSpecies = "human",
+     output_species = "human",
+     method = "homologene",
+     reps = 1000,
+     annotLevel = 1,
+     full_results = full_results,
+     listFileName = "",
+     savePath = tempdir(),
+     verbose = TRUE
+ )

Data

I included with a screenshot of how my dataset looks like, since I don't know how to subset my dataset and provide here. Hope this helps.

Screen Shot 2022-05-13 at 12 17 01 PM

3. Session info

``` > utils::sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS: /share/pkg.7/r/4.1.2/install/lib64/R/lib/libRblas.so LAPACK: /share/pkg.7/r/4.1.2/install/lib64/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] patchwork_1.1.1 SeuratObject_4.0.4 Seurat_4.1.0 [4] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0 Biobase_2.52.0 [7] GenomicRanges_1.44.0 GenomeInfoDb_1.28.4 IRanges_2.26.0 [10] S4Vectors_0.30.2 MatrixGenerics_1.4.3 matrixStats_0.61.0 [13] ggplot2_3.3.5 ewceData_1.2.0 ExperimentHub_2.2.1 [16] AnnotationHub_3.2.1 BiocFileCache_2.0.0 dbplyr_2.1.1 [19] BiocGenerics_0.38.0 EWCE_1.4.0 RNOmni_1.0.0 loaded via a namespace (and not attached): [1] backports_1.4.1 plyr_1.8.6 igraph_1.2.11 [4] lazyeval_0.2.2 splines_4.1.2 orthogene_1.0.2 [7] BiocParallel_1.26.2 listenv_0.8.0 scattermore_0.7 [10] digest_0.6.29 htmltools_0.5.2 fansi_1.0.2 [13] magrittr_2.0.2 memoise_2.0.1 tensor_1.5 [16] cluster_2.1.2 ROCR_1.0-11 limma_3.48.3 [19] globals_0.14.0 Biostrings_2.60.2 spatstat.sparse_2.1-0 [22] colorspace_2.0-3 blob_1.2.2 rappdirs_0.3.3 [25] ggrepel_0.9.1 xfun_0.29 dplyr_1.0.7 [28] crayon_1.4.2 RCurl_1.98-1.5 jsonlite_1.7.3 [31] spatstat.data_2.1-2 survival_3.2-13 zoo_1.8-10 [34] glue_1.6.1 polyclip_1.10-0 gtable_0.3.0 [37] zlibbioc_1.38.0 XVector_0.32.0 leiden_0.3.9 [40] HGNChelper_0.8.1 DelayedArray_0.18.0 car_3.0-12 [43] future.apply_1.8.1 abind_1.4-7 scales_1.1.1 [46] DBI_1.1.2 rstatix_0.7.0 miniUI_0.1.1.1 [49] Rcpp_1.0.8 viridisLite_0.4.0 xtable_1.8-6 [52] spatstat.core_2.3-2 reticulate_1.24 bit_4.0.4 [55] htmlwidgets_1.5.4 httr_1.4.2 RColorBrewer_1.1-2 [58] ellipsis_0.3.2 ica_1.0-2 farver_2.1.0 [61] pkgconfig_2.0.3 uwot_0.1.11 deldir_1.0-6 [64] utf8_1.2.2 labeling_0.4.2 tidyselect_1.1.1 [67] rlang_1.0.0 reshape2_1.4.4 later_1.3.0 [70] AnnotationDbi_1.56.2 munsell_0.5.0 BiocVersion_3.14.0 [73] tools_4.1.2 cachem_1.0.6 cli_3.1.1 [76] generics_0.1.2 RSQLite_2.2.9 broom_0.7.12 [79] ggridges_0.5.3 ggdendro_0.1.22 stringr_1.4.0 [82] fastmap_1.1.0 goftest_1.2-3 yaml_2.2.2 [85] knitr_1.37 babelgene_21.4 bit64_4.0.5 [88] fitdistrplus_1.1-6 purrr_0.3.4 RANN_2.6.1 [91] KEGGREST_1.32.0 gprofiler2_0.2.1 nlme_3.1-155 [94] pbapply_1.5-0 future_1.23.0 mime_0.12 [97] compiler_4.1.2 rstudioapi_0.13 plotly_4.10.0 [100] filelock_1.0.2 curl_4.3.2 png_0.1-7 [103] interactiveDisplayBase_1.32.0 ggsignif_0.6.3 spatstat.utils_2.3-0 [106] tibble_3.1.6 homologene_1.4.68.19.3.27 stringi_1.7.6 [109] highr_0.9 lattice_0.20-45 Matrix_1.4-1 [112] vctrs_0.3.8 pillar_1.7.0 lifecycle_1.0.1 [115] BiocManager_1.30.16 spatstat.geom_2.4-0 lmtest_0.9-39 [118] RcppAnnoy_0.0.19 data.table_1.14.2 cowplot_1.1.1 [121] bitops_1.0-7 irlba_2.3.5 httpuv_1.6.5 [124] R6_2.5.1 promises_1.2.0.1 KernSmooth_2.23-20 [127] gridExtra_2.3 parallelly_1.30.0 codetools_0.2-18 [130] MASS_7.3-55 assertthat_0.2.1 withr_2.4.3 [133] sctransform_0.3.3 GenomeInfoDbData_1.2.6 mgcv_1.8-38 [136] rpart_4.1.16 grid_4.1.2 tidyr_1.2.0 [139] carData_3.0-5 Rtsne_0.15 ggpubr_0.4.0 [142] shiny_1.7.1 ```
Al-Murphy commented 2 years ago

Hey, we really need a copy of the dataset that gives the error before we can figure out what has gone wrong. How have you generated your ctd? Did you use the current version of EWCE (1.5.1+)? Perhaps you should rerun the EWCE function to generate the ctd on your raw data and check if that worked? Try comparing it to the example ctd in EWCE:

ctd_ <- ewceData::ctd()
ctd_

At a glance it appears different e.g. yours is missing the median_exp and median_speciicity matrices and your cell types are factors not characters:

image

h1hui commented 2 years ago

I converted my Seurat object into sce, and then generated celltype data. Here is the code:

sce <- as.SingleCellExperiment(Nuclei)

# drop genes
nKeep = 8000
must_keep = genes
keep_genes = c(must_keep,sample(names(rowRanges(sce)),7955))
sce = sce[keep_genes,]

# calculate specificity matrices
exp_DROPPED <- drop_uninformative_genes(exp=sce,
  drop_nonhuman_genes = T,
  input_species = "human",
  level2annot=sce$celltype)

# generate CellTypeDataset
annotLevels = list(level1class=sce$celltype,
  level2class=sce$type)
fNames <- generate_celltype_data(exp=exp_DROPPED,
  annotLevels=annotLevels,
  groupName="Nuclei",
  savePath=tempdir())

I skipped fix bad genes and normalization, since they also produce error. But I did run drop genes. If there's anything wrong with what I did, please let me know.

Al-Murphy commented 2 years ago

Hey!

Again, we can't really help unless we have a copy of you 'Nuclei' dataset. It clearly has some differences to what EWCE is used to since you got errors after converting to SCE. One piece of advice is to have a look at this example SCE dataset and compare it to your 'Nuclei' dataset after you convert to sce, this code:

# Make SCE object
cortex_mrna <- ewceData::[cortex_mrna](https://rdrr.io/pkg/ewceData/man/cortex_mrna.html)()
cortex_mrna_sce <- scKirby::ingest_data(cortex_mrna, save_output = FALSE)

Also try using scKirby to convert to SCE, it can deal with a lot of dataset types and so is very robust and may work for you.

Also see the website section on creating a ctd for information on this.

I'm going to close this issue for now but if these suggestions don't work and you can provide the full dataset or a subset which gives the same error, feel free to open it again.

Thanks