edward130603 / BayesSpace

Bayesian model for clustering and enhancing the resolution of spatial gene expression experiments.
http://edward130603.github.io/BayesSpace
Other
96 stars 20 forks source link

Missing clusters when cluster at k = 28 #80

Open abspangler13 opened 2 years ago

abspangler13 commented 2 years ago

Hello,

I ran BayesSpace with k = 28 for ~113,000 spots. I've noticed that I only get 26 clusters because clusters 18 and 21 are missing. Is this to be expected? Is it possible, BayesSpace is merging small clusters?

> table(csv$cluster,useNA = "ifany")

    1     2     3     4     5     6     7     8     9    10    11    12    13 
 3100  4455  7686 10108  7161  3486  5752  3063  9308  6014  2473  2340  6212 
   14    15    16    17    19    20    22    23    24    25    26    27    28 
 9573  3480  5694  3417  2745  2498  1799  1609  3165  2487  2145   877  3280 
> nrow(csv)
[1] 113927

Thanks,

Abby cc @lcolladotor

R Session Information

```R 22425.557 46.134 22547.932 ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R Under development (unstable) (2021-11-06 r81149) os CentOS Linux 7 (Core) system x86_64, linux-gnu ui X11 language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz US/Eastern date 2022-03-15 pandoc 2.11.0.4 @ /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-devel/bin/pandoc ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── package * version date (UTC) lib source AnnotationDbi 1.57.1 2021-10-29 [2] Bioconductor AnnotationHub 3.3.9 2022-02-28 [2] Bioconductor assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.1.0) attempt 0.3.1 2020-05-03 [1] CRAN (R 4.2.0) BayesSpace * 1.5.1 2021-11-05 [1] Bioconductor beachmat 2.11.0 2021-10-26 [2] Bioconductor beeswarm 0.4.0 2021-06-01 [2] CRAN (R 4.2.0) benchmarkme 1.0.7 2021-03-21 [1] CRAN (R 4.2.0) benchmarkmeData 1.0.4 2020-04-23 [1] CRAN (R 4.2.0) Biobase * 2.55.0 2021-10-26 [2] Bioconductor BiocFileCache 2.3.4 2022-01-20 [2] Bioconductor BiocGenerics * 0.41.2 2021-11-15 [2] Bioconductor BiocIO 1.5.0 2021-10-26 [2] Bioconductor BiocManager 1.30.16 2021-06-15 [2] CRAN (R 4.2.0) BiocNeighbors 1.13.0 2021-10-26 [2] Bioconductor BiocParallel 1.29.17 2022-03-13 [2] Bioconductor BiocSingular 1.11.0 2021-10-26 [2] Bioconductor BiocVersion 3.15.0 2021-10-26 [2] Bioconductor Biostrings 2.63.1 2022-01-05 [2] Bioconductor bit 4.0.4 2020-08-04 [2] CRAN (R 4.1.0) bit64 4.0.5 2020-08-30 [2] CRAN (R 4.1.0) bitops 1.0-7 2021-04-24 [2] CRAN (R 4.2.0) blob 1.2.2 2021-07-23 [2] CRAN (R 4.2.0) bluster 1.5.0 2021-10-26 [2] Bioconductor brio 1.1.3 2021-11-30 [2] CRAN (R 4.2.0) bslib 0.3.1 2021-10-06 [2] CRAN (R 4.2.0) cachem 1.0.6 2021-08-19 [2] CRAN (R 4.2.0) callr 3.7.0 2021-04-20 [2] CRAN (R 4.2.0) cli 3.2.0 2022-02-14 [2] CRAN (R 4.2.0) cluster 2.1.2 2021-04-17 [3] CRAN (R 4.2.0) coda 0.19-4 2020-09-30 [2] CRAN (R 4.1.0) codetools 0.2-18 2020-11-04 [3] CRAN (R 4.2.0) colorspace 2.0-3 2022-02-21 [2] CRAN (R 4.2.0) config 0.3.1 2020-12-17 [1] CRAN (R 4.2.0) cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.2.0) crayon 1.5.0 2022-02-14 [2] CRAN (R 4.2.0) curl 4.3.2 2021-06-23 [2] CRAN (R 4.2.0) data.table 1.14.2 2021-09-27 [2] CRAN (R 4.2.0) DBI 1.1.2 2021-12-20 [2] CRAN (R 4.2.0) dbplyr 2.1.1 2021-04-06 [2] CRAN (R 4.1.0) DelayedArray 0.21.2 2021-11-16 [2] Bioconductor DelayedMatrixStats 1.17.0 2021-10-26 [2] Bioconductor desc 1.4.1 2022-03-06 [2] CRAN (R 4.2.0) digest 0.6.29 2021-12-01 [2] CRAN (R 4.2.0) DirichletReg 0.7-1 2021-05-18 [1] CRAN (R 4.2.0) dockerfiler 0.1.4 2021-09-03 [1] CRAN (R 4.2.0) doParallel 1.0.17 2022-02-07 [2] CRAN (R 4.2.0) dotCall64 1.0-1 2021-02-11 [2] CRAN (R 4.1.0) dplyr 1.0.8 2022-02-08 [2] CRAN (R 4.2.0) dqrng 0.3.0 2021-05-01 [2] CRAN (R 4.2.0) DropletUtils 1.15.2 2021-11-08 [2] Bioconductor DT 0.21 2022-02-26 [2] CRAN (R 4.2.0) edgeR 3.37.0 2021-10-26 [2] Bioconductor ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.2.0) ExperimentHub 2.3.5 2022-01-20 [2] Bioconductor fansi 1.0.2 2022-01-14 [2] CRAN (R 4.2.0) farver 2.1.0 2021-02-28 [2] CRAN (R 4.1.0) fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.1.0) fields 13.3 2021-10-30 [2] CRAN (R 4.2.0) filelock 1.0.2 2018-10-05 [2] CRAN (R 4.1.0) foreach 1.5.2 2022-02-02 [2] CRAN (R 4.2.0) Formula 1.2-4 2020-10-16 [2] CRAN (R 4.1.0) fs 1.5.2 2021-12-08 [2] CRAN (R 4.2.0) generics 0.1.2 2022-01-31 [2] CRAN (R 4.2.0) GenomeInfoDb * 1.31.4 2022-01-30 [2] Bioconductor GenomeInfoDbData 1.2.7 2021-11-02 [2] Bioconductor GenomicAlignments 1.31.2 2021-11-05 [2] Bioconductor GenomicRanges * 1.47.6 2022-01-12 [2] Bioconductor ggbeeswarm 0.6.0 2017-08-07 [2] CRAN (R 4.2.0) ggplot2 * 3.3.5 2021-06-25 [2] CRAN (R 4.2.0) ggrepel 0.9.1 2021-01-15 [2] CRAN (R 4.1.0) glue 1.6.2 2022-02-24 [2] CRAN (R 4.2.0) golem 0.3.1 2021-04-17 [1] CRAN (R 4.2.0) gridExtra 2.3 2017-09-09 [2] CRAN (R 4.1.0) gtable 0.3.0 2019-03-25 [2] CRAN (R 4.1.0) HDF5Array 1.23.2 2021-11-15 [2] Bioconductor here * 1.0.1 2020-12-13 [1] CRAN (R 4.2.0) htmltools 0.5.2 2021-08-25 [2] CRAN (R 4.2.0) htmlwidgets 1.5.4 2021-09-08 [2] CRAN (R 4.2.0) httpuv 1.6.5 2022-01-05 [2] CRAN (R 4.2.0) httr 1.4.2 2020-07-20 [2] CRAN (R 4.1.0) igraph 1.2.11 2022-01-04 [2] CRAN (R 4.2.0) interactiveDisplayBase 1.33.0 2021-10-26 [2] Bioconductor IRanges * 2.29.1 2021-11-16 [2] Bioconductor irlba 2.3.5 2021-12-06 [2] CRAN (R 4.2.0) iterators 1.0.14 2022-02-05 [2] CRAN (R 4.2.0) jquerylib 0.1.4 2021-04-26 [2] CRAN (R 4.2.0) jsonlite 1.8.0 2022-02-22 [2] CRAN (R 4.2.0) KEGGREST 1.35.0 2021-10-26 [2] Bioconductor knitr 1.37 2021-12-16 [2] CRAN (R 4.2.0) labeling 0.4.2 2020-10-20 [2] CRAN (R 4.1.0) later 1.3.0 2021-08-18 [2] CRAN (R 4.2.0) lattice 0.20-45 2021-09-22 [3] CRAN (R 4.2.0) lazyeval 0.2.2 2019-03-15 [2] CRAN (R 4.1.0) lifecycle 1.0.1 2021-09-24 [2] CRAN (R 4.2.0) limma 3.51.5 2022-02-17 [2] Bioconductor locfit 1.5-9.5 2022-03-03 [2] CRAN (R 4.2.0) magick 2.7.3 2021-08-18 [2] CRAN (R 4.2.0) magrittr 2.0.2 2022-01-26 [2] CRAN (R 4.2.0) maps 3.4.0 2021-09-25 [2] CRAN (R 4.2.0) Matrix 1.4-0 2021-12-08 [3] CRAN (R 4.2.0) MatrixGenerics * 1.7.0 2021-10-26 [2] Bioconductor matrixStats * 0.61.0 2021-09-17 [2] CRAN (R 4.2.0) maxLik 1.5-2 2021-07-26 [1] CRAN (R 4.2.0) mclust 5.4.9 2021-12-17 [2] CRAN (R 4.2.0) memoise 2.0.1 2021-11-26 [2] CRAN (R 4.2.0) metapod 1.3.0 2021-10-26 [2] Bioconductor mime 0.12 2021-09-28 [2] CRAN (R 4.2.0) miscTools 0.6-26 2019-12-08 [1] CRAN (R 4.2.0) munsell 0.5.0 2018-06-12 [2] CRAN (R 4.1.0) pillar 1.7.0 2022-02-01 [2] CRAN (R 4.2.0) pkgbuild 1.3.1 2021-12-20 [2] CRAN (R 4.2.0) pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.1.0) pkgload 1.2.4 2021-11-30 [2] CRAN (R 4.2.0) plotly 4.10.0 2021-10-09 [2] CRAN (R 4.2.0) png 0.1-7 2013-12-03 [2] CRAN (R 4.1.0) Polychrome 1.3.1 2021-07-16 [1] CRAN (R 4.2.0) prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.1.0) processx 3.5.2 2021-04-30 [2] CRAN (R 4.2.0) promises 1.2.0.1 2021-02-11 [2] CRAN (R 4.1.0) ps 1.6.0 2021-02-28 [2] CRAN (R 4.1.0) purrr 0.3.4 2020-04-17 [2] CRAN (R 4.1.0) R.methodsS3 1.8.1 2020-08-26 [2] CRAN (R 4.1.0) R.oo 1.24.0 2020-08-26 [2] CRAN (R 4.1.0) R.utils 2.11.0 2021-09-26 [2] CRAN (R 4.2.0) R6 2.5.1 2021-08-19 [2] CRAN (R 4.2.0) rappdirs 0.3.3 2021-01-31 [2] CRAN (R 4.1.0) RColorBrewer * 1.1-2 2014-12-07 [2] CRAN (R 4.1.0) Rcpp 1.0.8.2 2022-03-11 [2] CRAN (R 4.2.0) RCurl 1.98-1.6 2022-02-08 [2] CRAN (R 4.2.0) remotes 2.4.2 2021-11-30 [2] CRAN (R 4.2.0) restfulr 0.0.13 2017-08-06 [2] CRAN (R 4.1.0) rhdf5 2.39.6 2022-03-09 [2] Bioconductor rhdf5filters 1.7.0 2021-10-26 [2] Bioconductor Rhdf5lib 1.17.3 2022-01-31 [2] Bioconductor rjson 0.2.21 2022-01-09 [2] CRAN (R 4.2.0) rlang 1.0.2 2022-03-04 [2] CRAN (R 4.2.0) roxygen2 7.1.2 2021-09-08 [2] CRAN (R 4.2.0) rprojroot 2.0.2 2020-11-15 [2] CRAN (R 4.1.0) Rsamtools 2.11.0 2021-10-27 [2] Bioconductor RSQLite 2.2.10 2022-02-17 [2] CRAN (R 4.2.0) rstudioapi 0.13 2020-11-12 [2] CRAN (R 4.1.0) rsvd 1.0.5 2021-04-16 [2] CRAN (R 4.2.0) rtracklayer 1.55.3 2021-12-08 [2] Bioconductor S4Vectors * 0.33.10 2022-01-12 [2] Bioconductor sandwich 3.0-1 2021-05-18 [2] CRAN (R 4.2.0) sass 0.4.0 2021-05-12 [2] CRAN (R 4.2.0) ScaledMatrix 1.3.0 2021-10-26 [2] Bioconductor scales 1.1.1 2020-05-11 [2] CRAN (R 4.1.0) scater 1.23.2 2021-12-14 [2] Bioconductor scatterplot3d 0.3-41 2018-03-14 [1] CRAN (R 4.2.0) scran 1.23.1 2021-11-12 [2] Bioconductor scuttle 1.5.0 2021-10-27 [2] Bioconductor sessioninfo * 1.2.2 2021-12-06 [2] CRAN (R 4.2.0) shiny 1.7.1 2021-10-02 [2] CRAN (R 4.2.0) shinyWidgets 0.6.4 2022-02-06 [2] CRAN (R 4.2.0) SingleCellExperiment * 1.17.2 2021-11-18 [2] Bioconductor spam 2.8-0 2022-01-06 [2] CRAN (R 4.2.0) sparseMatrixStats 1.7.0 2021-10-26 [2] Bioconductor SpatialExperiment * 1.5.4 2022-03-11 [2] Bioconductor spatialLIBD * 1.7.12 2022-03-03 [1] Github (LieberInstitute/spatialLIBD@a416438) statmod 1.4.36 2021-05-10 [2] CRAN (R 4.2.0) stringi 1.7.6 2021-11-29 [2] CRAN (R 4.2.0) stringr 1.4.0 2019-02-10 [2] CRAN (R 4.1.0) SummarizedExperiment * 1.25.3 2021-12-08 [2] Bioconductor testthat 3.1.2 2022-01-20 [2] CRAN (R 4.2.0) tibble 3.1.6 2021-11-07 [2] CRAN (R 4.2.0) tidyr 1.2.0 2022-02-01 [2] CRAN (R 4.2.0) tidyselect 1.1.2 2022-02-21 [2] CRAN (R 4.2.0) usethis 2.1.5 2021-12-09 [2] CRAN (R 4.2.0) utf8 1.2.2 2021-07-24 [2] CRAN (R 4.2.0) vctrs 0.3.8 2021-04-29 [2] CRAN (R 4.2.0) vipor 0.4.5 2017-03-22 [2] CRAN (R 4.2.0) viridis 0.6.2 2021-10-13 [2] CRAN (R 4.2.0) viridisLite 0.4.0 2021-04-13 [2] CRAN (R 4.2.0) withr 2.5.0 2022-03-03 [2] CRAN (R 4.2.0) xfun 0.30 2022-03-02 [2] CRAN (R 4.2.0) xgboost 1.5.2.1 2022-02-21 [1] CRAN (R 4.2.0) XML 3.99-0.9 2022-02-24 [2] CRAN (R 4.2.0) xml2 1.3.3 2021-11-30 [2] CRAN (R 4.2.0) xtable 1.8-4 2019-04-21 [2] CRAN (R 4.1.0) XVector 0.35.0 2021-10-26 [2] Bioconductor yaml 2.3.5 2022-02-21 [2] CRAN (R 4.2.0) zlibbioc 1.41.0 2021-10-26 [2] Bioconductor zoo 1.8-9 2021-03-09 [2] CRAN (R 4.1.0) [1] /users/aspangle/R/devel [2] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-devel/R/devel/lib64/R/site-library [3] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-devel/R/devel/lib64/R/library ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ```
edward130603 commented 2 years ago

Hi Abby, yes I have seen this happen before in other data as well. In such cases, BayesSpace's model finds better partitions of the data using fewer clusters. The spatial smoothing BayesSpace applies penalizes neighboring spots with different cluster labels, so if you have two clusters (in your initialization) that are fairly similar, BayesSpace may combine them.

One way to get around this potentially is by reducing the value of the smoothing parameter gamma. Trying a different initialization may also help here since you have a pretty large dataset.

lcolladotor commented 2 years ago

Thank for the info Edward! Thanks for the confirmation that you've seen this before and for the info on gamma.

I think that we'll be ok with k_obs <= k_input. Abby, I think that you can close this issue.

Best, Leo