spatialEnhance() requires lots of memory: likely due to find_neighbors() vs .find_neighbors()

lcolladotor commented 2 years ago

Hi,

While using BayesSpace version 1.4.1, I'm running spatialEnhance() on a dataset with the following dimensions:

> dim(spe)
[1] 27853 38115

with it, I get the following error:

spe$imagerow <- spatialData(spe)$array_row
spe$imagecol <- spatialData(spe)$array_col
imgData(spe) <- NULL ## To reduce mem, though I see that it doesn't matter really since we are using the reducedDim(spe, "HARMONY") data
spe <- spatialEnhance(spe, use.dimred = "HARMONY", q = 14) ## Same issue with q = 4
# Error: cannot allocate vector of size 194.8 Gb

and the following traceback:

> traceback()
5: stats::dist(positions, method = method)
4: as.matrix(stats::dist(positions, method = method))
3: find_neighbors(positions2, dist, "manhattan")
2: deconvolve(inputs$PCs, inputs$positions, nrep = nrep, gamma = gamma,
       xdist = inputs$xdist, ydist = inputs$ydist, q = q, init = init,
       model = model, platform = platform, verbose = verbose, jitter_scale = jitter_scale,
       jitter_prior = jitter_prior, mu0 = mu0, lambda0 = lambda0, 
       alpha = alpha, beta = beta)
1: spatialEnhance(spe, use.dimred = "HARMONY", q = k, nrep = 10000,
       burn.in = 2000)

Diving into the details, I see that spatialCluster() uses .find_neighbors() as you can see at https://github.com/edward130603/BayesSpace/blob/9254efaa69958bf676e924e0369beca6e4aca993/R/spatialCluster.R#L144 whereas spatialEnchance() uses find_neighbors() https://github.com/edward130603/BayesSpace/blob/cd2c155f457b3bf05986b2961fa8f4d2937769fe/R/spatialEnhance.R#L129. This then means that a large matrix is created at https://github.com/edward130603/BayesSpace/blob/cd2c155f457b3bf05986b2961fa8f4d2937769fe/R/utils.R#L18.

Could this issue maybe be resolved by switching from find_neighbors() to .find_neighbors() in spatialEnhance()?

R session info

```R ─ Session info ─────────────────────────────────────────────────── setting value version R version 4.1.2 Patched (2021-11-04 r81138) os CentOS Linux 7 (Core) system x86_64, linux-gnu ui X11 language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz US/Eastern date 2022-02-03 pandoc 2.13 @ /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.1.x/bin/pandoc ─ Packages ─────────────────────────────────────────────────────── package * version date (UTC) lib source AnnotationDbi 1.56.2 2021-11-09 [2] Bioconductor AnnotationHub 3.2.1 2022-01-23 [2] Bioconductor assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.1.0) attempt 0.3.1 2020-05-03 [1] CRAN (R 4.1.2) BayesSpace * 1.4.1 2021-11-11 [1] Bioconductor beachmat 2.10.0 2021-10-26 [2] Bioconductor beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.1.2) benchmarkme 1.0.7 2021-03-21 [1] CRAN (R 4.1.2) benchmarkmeData 1.0.4 2020-04-23 [1] CRAN (R 4.1.2) Biobase * 2.54.0 2021-10-26 [2] Bioconductor BiocFileCache 2.2.1 2022-01-23 [2] Bioconductor BiocGenerics * 0.40.0 2021-10-26 [2] Bioconductor BiocIO 1.4.0 2021-10-26 [2] Bioconductor BiocManager 1.30.16 2021-06-15 [2] CRAN (R 4.1.2) BiocNeighbors 1.12.0 2021-10-26 [1] Bioconductor BiocParallel 1.28.3 2021-12-09 [2] Bioconductor BiocSingular 1.10.0 2021-10-26 [1] Bioconductor BiocVersion 3.14.0 2021-05-19 [2] Bioconductor Biostrings 2.62.0 2021-10-26 [2] Bioconductor bit 4.0.4 2020-08-04 [2] CRAN (R 4.1.0) bit64 4.0.5 2020-08-30 [2] CRAN (R 4.1.0) bitops 1.0-7 2021-04-24 [2] CRAN (R 4.1.0) blob 1.2.2 2021-07-23 [2] CRAN (R 4.1.0) bluster 1.4.0 2021-10-26 [1] Bioconductor brio 1.1.3 2021-11-30 [2] CRAN (R 4.1.2) bslib 0.3.1 2021-10-06 [2] CRAN (R 4.1.2) cachem 1.0.6 2021-08-19 [2] CRAN (R 4.1.2) callr 3.7.0 2021-04-20 [2] CRAN (R 4.1.0) cli 3.1.1 2022-01-20 [2] CRAN (R 4.1.2) cluster 2.1.2 2021-04-17 [3] CRAN (R 4.1.2) coda 0.19-4 2020-09-30 [2] CRAN (R 4.1.0) codetools 0.2-18 2020-11-04 [3] CRAN (R 4.1.2) colorout 1.2-2 2021-11-02 [1] Github (jalvesaq/colorout@79931fd) colorspace 2.0-2 2021-06-24 [2] CRAN (R 4.1.0) config 0.3.1 2020-12-17 [1] CRAN (R 4.1.2) cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.1.2) crayon 1.4.2 2021-10-29 [2] CRAN (R 4.1.2) curl 4.3.2 2021-06-23 [2] CRAN (R 4.1.0) data.table 1.14.2 2021-09-27 [2] CRAN (R 4.1.2) DBI 1.1.2 2021-12-20 [2] CRAN (R 4.1.2) dbplyr 2.1.1 2021-04-06 [2] CRAN (R 4.1.0) DelayedArray 0.20.0 2021-10-26 [2] Bioconductor DelayedMatrixStats 1.16.0 2021-10-26 [2] Bioconductor desc 1.4.0 2021-09-28 [2] CRAN (R 4.1.2) digest 0.6.29 2021-12-01 [2] CRAN (R 4.1.2) DirichletReg 0.7-1 2021-05-18 [1] CRAN (R 4.1.2) dockerfiler 0.1.4 2021-09-03 [1] CRAN (R 4.1.2) doParallel 1.0.16 2020-10-16 [2] CRAN (R 4.1.0) dotCall64 1.0-1 2021-02-11 [2] CRAN (R 4.1.0) dplyr 1.0.7 2021-06-18 [2] CRAN (R 4.1.0) dqrng 0.3.0 2021-05-01 [1] CRAN (R 4.1.2) DropletUtils 1.14.2 2022-01-09 [1] Bioconductor DT 0.20 2021-11-15 [2] CRAN (R 4.1.2) edgeR 3.36.0 2021-10-26 [2] Bioconductor ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.1.0) ExperimentHub 2.2.1 2022-01-23 [2] Bioconductor fansi 1.0.2 2022-01-14 [2] CRAN (R 4.1.2) farver 2.1.0 2021-02-28 [2] CRAN (R 4.1.0) fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.1.0) fields 13.3 2021-10-30 [2] CRAN (R 4.1.2) filelock 1.0.2 2018-10-05 [2] CRAN (R 4.1.0) foreach 1.5.2 2022-02-02 [2] CRAN (R 4.1.2) Formula 1.2-4 2020-10-16 [2] CRAN (R 4.1.0) fs 1.5.2 2021-12-08 [2] CRAN (R 4.1.2) generics 0.1.2 2022-01-31 [2] CRAN (R 4.1.2) GenomeInfoDb * 1.30.1 2022-01-30 [2] Bioconductor GenomeInfoDbData 1.2.7 2021-11-01 [2] Bioconductor GenomicAlignments 1.30.0 2021-10-26 [2] Bioconductor GenomicRanges * 1.46.1 2021-11-18 [2] Bioconductor getopt * 1.20.3 2019-03-22 [2] CRAN (R 4.1.0) ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.1.2) ggplot2 3.3.5 2021-06-25 [2] CRAN (R 4.1.0) ggrepel 0.9.1 2021-01-15 [2] CRAN (R 4.1.0) glue 1.6.1 2022-01-22 [2] CRAN (R 4.1.2) golem 0.3.1 2021-04-17 [1] CRAN (R 4.1.2) gridExtra 2.3 2017-09-09 [2] CRAN (R 4.1.0) gtable 0.3.0 2019-03-25 [2] CRAN (R 4.1.0) HDF5Array 1.22.1 2021-11-14 [2] Bioconductor here * 1.0.1 2020-12-13 [1] CRAN (R 4.1.2) htmltools 0.5.2 2021-08-25 [2] CRAN (R 4.1.2) htmlwidgets 1.5.4 2021-09-08 [2] CRAN (R 4.1.2) httpuv 1.6.5 2022-01-05 [2] CRAN (R 4.1.2) httr 1.4.2 2020-07-20 [2] CRAN (R 4.1.0) igraph 1.2.11 2022-01-04 [2] CRAN (R 4.1.2) interactiveDisplayBase 1.32.0 2021-10-26 [2] Bioconductor IRanges * 2.28.0 2021-10-26 [2] Bioconductor irlba 2.3.5 2021-12-06 [2] CRAN (R 4.1.2) iterators 1.0.13 2020-10-15 [2] CRAN (R 4.1.0) jquerylib 0.1.4 2021-04-26 [2] CRAN (R 4.1.0) jsonlite 1.7.3 2022-01-17 [2] CRAN (R 4.1.2) KEGGREST 1.34.0 2021-10-26 [2] Bioconductor knitr 1.37 2021-12-16 [2] CRAN (R 4.1.2) labeling 0.4.2 2020-10-20 [2] CRAN (R 4.1.0) later 1.3.0 2021-08-18 [2] CRAN (R 4.1.2) lattice 0.20-45 2021-09-22 [3] CRAN (R 4.1.2) lazyeval 0.2.2 2019-03-15 [2] CRAN (R 4.1.0) lifecycle 1.0.1 2021-09-24 [2] CRAN (R 4.1.2) limma 3.50.0 2021-10-26 [2] Bioconductor locfit 1.5-9.4 2020-03-25 [2] CRAN (R 4.1.0) magick 2.7.3 2021-08-18 [2] CRAN (R 4.1.2) magrittr 2.0.2 2022-01-26 [2] CRAN (R 4.1.2) maps 3.4.0 2021-09-25 [2] CRAN (R 4.1.2) Matrix 1.4-0 2021-12-08 [3] CRAN (R 4.1.2) MatrixGenerics * 1.6.0 2021-10-26 [2] Bioconductor matrixStats * 0.61.0 2021-09-17 [2] CRAN (R 4.1.2) maxLik 1.5-2 2021-07-26 [1] CRAN (R 4.1.2) mclust 5.4.9 2021-12-17 [2] CRAN (R 4.1.2) memoise 2.0.1 2021-11-26 [2] CRAN (R 4.1.2) metapod 1.2.0 2021-10-26 [1] Bioconductor mime 0.12 2021-09-28 [2] CRAN (R 4.1.2) miscTools 0.6-26 2019-12-08 [1] CRAN (R 4.1.2) munsell 0.5.0 2018-06-12 [2] CRAN (R 4.1.0) pillar 1.7.0 2022-02-01 [2] CRAN (R 4.1.2) pkgbuild 1.3.1 2021-12-20 [2] CRAN (R 4.1.2) pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.1.0) pkgload 1.2.4 2021-11-30 [2] CRAN (R 4.1.2) plotly 4.10.0 2021-10-09 [2] CRAN (R 4.1.2) png 0.1-7 2013-12-03 [2] CRAN (R 4.1.0) Polychrome * 1.3.1 2021-07-16 [1] CRAN (R 4.1.2) prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.1.0) processx 3.5.2 2021-04-30 [2] CRAN (R 4.1.0) promises 1.2.0.1 2021-02-11 [2] CRAN (R 4.1.0) ps 1.6.0 2021-02-28 [2] CRAN (R 4.1.0) purrr 0.3.4 2020-04-17 [2] CRAN (R 4.1.0) R.methodsS3 1.8.1 2020-08-26 [2] CRAN (R 4.1.0) R.oo 1.24.0 2020-08-26 [2] CRAN (R 4.1.0) R.utils 2.11.0 2021-09-26 [2] CRAN (R 4.1.2) R6 2.5.1 2021-08-19 [2] CRAN (R 4.1.2) rappdirs 0.3.3 2021-01-31 [2] CRAN (R 4.1.0) RColorBrewer 1.1-2 2014-12-07 [2] CRAN (R 4.1.0) Rcpp 1.0.8 2022-01-13 [2] CRAN (R 4.1.2) RCurl 1.98-1.5 2021-09-17 [2] CRAN (R 4.1.2) remotes 2.4.2 2021-11-30 [2] CRAN (R 4.1.2) restfulr 0.0.13 2017-08-06 [2] CRAN (R 4.1.0) rhdf5 2.38.0 2021-10-26 [2] Bioconductor rhdf5filters 1.6.0 2021-10-26 [2] Bioconductor Rhdf5lib 1.16.0 2021-10-26 [2] Bioconductor rjson 0.2.21 2022-01-09 [2] CRAN (R 4.1.2) rlang 1.0.1 2022-02-03 [2] CRAN (R 4.1.2) rmote 0.3.4 2021-11-02 [1] Github (cloudyr/rmote@fbce611) roxygen2 7.1.2 2021-09-08 [2] CRAN (R 4.1.2) rprojroot 2.0.2 2020-11-15 [2] CRAN (R 4.1.0) Rsamtools 2.10.0 2021-10-26 [2] Bioconductor RSQLite 2.2.9 2021-12-06 [2] CRAN (R 4.1.2) rstudioapi 0.13 2020-11-12 [2] CRAN (R 4.1.0) rsvd 1.0.5 2021-04-16 [1] CRAN (R 4.1.2) rtracklayer 1.54.0 2021-10-26 [2] Bioconductor S4Vectors * 0.32.3 2021-11-21 [2] Bioconductor sandwich 3.0-1 2021-05-18 [2] CRAN (R 4.1.0) sass 0.4.0 2021-05-12 [2] CRAN (R 4.1.0) ScaledMatrix 1.2.0 2021-10-26 [1] Bioconductor scales 1.1.1 2020-05-11 [2] CRAN (R 4.1.0) scater 1.22.0 2021-10-26 [1] Bioconductor scatterplot3d 0.3-41 2018-03-14 [1] CRAN (R 4.1.2) scran 1.22.1 2021-11-14 [1] Bioconductor scuttle 1.4.0 2021-10-26 [1] Bioconductor servr 0.24 2021-11-16 [1] CRAN (R 4.1.2) sessioninfo * 1.2.2 2021-12-06 [2] CRAN (R 4.1.2) shiny 1.7.1 2021-10-02 [2] CRAN (R 4.1.2) shinyWidgets 0.6.3 2022-01-10 [1] CRAN (R 4.1.2) SingleCellExperiment * 1.16.0 2021-10-26 [2] Bioconductor spam 2.8-0 2022-01-06 [2] CRAN (R 4.1.2) sparseMatrixStats 1.6.0 2021-10-26 [2] Bioconductor SpatialExperiment * 1.4.0 2021-10-26 [1] Bioconductor spatialLIBD * 1.6.5 2022-01-12 [1] Bioconductor statmod 1.4.36 2021-05-10 [2] CRAN (R 4.1.0) stringi 1.7.6 2021-11-29 [2] CRAN (R 4.1.2) stringr 1.4.0 2019-02-10 [2] CRAN (R 4.1.0) SummarizedExperiment * 1.24.0 2021-10-26 [2] Bioconductor testthat 3.1.2 2022-01-20 [2] CRAN (R 4.1.2) tibble 3.1.6 2021-11-07 [2] CRAN (R 4.1.2) tidyr 1.2.0 2022-02-01 [2] CRAN (R 4.1.2) tidyselect 1.1.1 2021-04-30 [2] CRAN (R 4.1.0) usethis 2.1.5 2021-12-09 [2] CRAN (R 4.1.2) utf8 1.2.2 2021-07-24 [2] CRAN (R 4.1.0) vctrs 0.3.8 2021-04-29 [2] CRAN (R 4.1.0) vipor 0.4.5 2017-03-22 [1] CRAN (R 4.1.2) viridis 0.6.2 2021-10-13 [2] CRAN (R 4.1.2) viridisLite 0.4.0 2021-04-13 [2] CRAN (R 4.1.0) withr 2.4.3 2021-11-30 [2] CRAN (R 4.1.2) xfun 0.29 2021-12-14 [2] CRAN (R 4.1.2) xgboost 1.5.0.2 2021-11-21 [1] CRAN (R 4.1.2) XML 3.99-0.8 2021-09-17 [2] CRAN (R 4.1.2) xml2 1.3.3 2021-11-30 [2] CRAN (R 4.1.2) xtable 1.8-4 2019-04-21 [2] CRAN (R 4.1.0) XVector 0.34.0 2021-10-26 [2] Bioconductor yaml 2.2.2 2022-01-25 [2] CRAN (R 4.1.2) zlibbioc 1.40.0 2021-10-26 [2] Bioconductor zoo 1.8-9 2021-03-09 [2] CRAN (R 4.1.0) [1] /users/lcollado/R/4.1.x [2] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.1.x/R/4.1.x/lib64/R/site-library [3] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.1.x/R/4.1.x/lib64/R/library ────────────────────────────────────────────────────────────────── ```

Best, Leo

edward130603 commented 2 years ago

@msto do you recall why we used different functions for finding neighbors? Was it due to the non-integer coordinates for subspots?

edward130603 commented 2 years ago

@lcolladotor I'll look into this. But I think you might run into more memory issues (not to mention compute time) doing the enhancement on this large dataset. We'll try to optimize memory more, but in the meantime, it might be best to just run joint clustering at the spot level, and then single sample spatialEnhance using the joint cluster labels to initialize.

lcolladotor commented 2 years ago

Thanks Edward! I'll try the strategy you suggest in the meantime =)

Best, Leo

lcolladotor commented 2 years ago

Hi Edward,

I wasn't careful enough and messed up this analysis.

## The the spot level clustering
set.seed(20220201)
spe <- spatialCluster(spe, use.dimred = "HARMONY", q = k)

## Run spatialEnhance() one sample at a time
message("Running spatialEnhance() -- currently crashes due to https://github.com/edward130603/BayesSpace/issues/71")
Sys.time()
spe$imagerow <- spe$array_row
spe$imagecol <- spe$array_col

for (sample in unique(spe$sample_id)) {
    message(Sys.time(), " processing sample ", sample)
    set.seed(20220208)
    spe_small <- spatialEnhance(spe[, spe$sample_id == sample], use.dimred = "HARMONY", q = k)
    spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster
    rm(spe_small)
}
Sys.time()

I should have looked more closely at https://edward130603.github.io/BayesSpace/reference/spatialEnhance.html and realized that the output dimensions would be incompatible. Since I didn't, I got lots of warnings like:

+Warning messages:
+1: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
+  number of items to replace is not a multiple of replacement length

It took quite a while to run too as you can see below.

Running spatialEnhance() -- currently crashes due to https://github.com/edward130603/BayesSpace/issues/71
[1] "2022-02-10 02:01:45 EST"
2022-02-10 02:01:46 processing sample V10A27106_A1_Br3874
Calculating labels using iterations 10000 through 2e+05.
2022-02-12 15:41:50 processing sample V10A27106_B1_Br3854
Calculating labels using iterations 10000 through 2e+05.
2022-02-14 07:58:43 processing sample V10A27106_C1_Br3873
Calculating labels using iterations 10000 through 2e+05.
2022-02-16 01:28:00 processing sample V10A27106_D1_Br3880
Calculating labels using iterations 10000 through 2e+05.
2022-02-18 21:54:58 processing sample V10T31036_A1_Br3874
Calculating labels using iterations 10000 through 2e+05.
2022-02-21 14:11:49 processing sample V10T31036_B1_Br3854
Calculating labels using iterations 10000 through 2e+05.
2022-02-23 20:36:02 processing sample V10T31036_C1_Br3873
Calculating labels using iterations 10000 through 2e+05.
2022-02-25 16:38:46 processing sample V10T31036_D1_Br3880
Calculating labels using iterations 10000 through 2e+05.
2022-02-28 15:29:46 processing sample V10A27004_A1_Br3874
Calculating labels using iterations 10000 through 2e+05.
2022-03-03 19:16:11 processing sample V10A27004_D1_Br3880
Calculating labels using iterations 10000 through 2e+05.
Warning messages:
1: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
  number of items to replace is not a multiple of replacement length
2: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
  number of items to replace is not a multiple of replacement length
3: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
  number of items to replace is not a multiple of replacement length
4: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
  number of items to replace is not a multiple of replacement length
5: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
  number of items to replace is not a multiple of replacement length
6: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
  number of items to replace is not a multiple of replacement length
7: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
  number of items to replace is not a multiple of replacement length
8: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
  number of items to replace is not a multiple of replacement length
9: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
  number of items to replace is not a multiple of replacement length
10: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
  number of items to replace is not a multiple of replacement length
[1] "2022-03-06 17:51:54 EST"

I see now that with the example for spatialEnhance() we have the following:

library("BayesSpace")
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#>     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#>     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#>     union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
set.seed(149)
sce <- exampleSCE()
sce <- spatialCluster(sce, 7, nrep=100, burn.in=10)
#> Neighbors were identified for 96 out of 96 spots.
#> Fitting model...
#> Calculating labels using iterations 10 through 100.
enhanced <- spatialEnhance(sce, 7, nrep=100, burn.in=10)
#> Calculating labels using iterations 0 through 100.
colData(enhanced)
#> DataFrame with 864 rows and 9 columns
#>               spot.idx subspot.idx  spot.row  spot.col       row       col
#>              <numeric>   <integer> <integer> <integer> <numeric> <numeric>
#> subspot_1.1          1           1         1         1   1.33333   1.33333
#> subspot_2.1          2           1         1         2   1.33333   2.33333
#> subspot_3.1          3           1         1         3   1.33333   3.33333
#> subspot_4.1          4           1         1         4   1.33333   4.33333
#> subspot_5.1          5           1         1         5   1.33333   5.33333
#> ...                ...         ...       ...       ...       ...       ...
#> subspot_92.9        92           9         8         8         8         8
#> subspot_93.9        93           9         8         9         8         9
#> subspot_94.9        94           9         8        10         8        10
#> subspot_95.9        95           9         8        11         8        11
#> subspot_96.9        96           9         8        12         8        12
#>               imagerow  imagecol spatial.cluster
#>              <numeric> <numeric>       <numeric>
#> subspot_1.1    1.33333   1.33333               1
#> subspot_2.1    1.33333   2.33333               1
#> subspot_3.1    1.33333   3.33333               5
#> subspot_4.1    1.33333   4.33333               5
#> subspot_5.1    1.33333   5.33333               4
#> ...                ...       ...             ...
#> subspot_92.9         8         8               5
#> subspot_93.9         8         9               5
#> subspot_94.9         8        10               3
#> subspot_95.9         8        11               6
#> subspot_96.9         8        12               1
x <- colData(enhanced)
y <- split(x, x$spot.idx)
table(lengths(y))
#> 
#>  9 
#> 96
packageVersion("BayesSpace")
#> [1] '1.5.1'

^{Created on 2022-03-07 by the reprex package (v2.0.1)}

I feel like returning something on the same dimensions might be easier though, like with:

> zz <- do.call(rbind, lapply(y, function(z) { DataFrame(enhanced_row = NumericList(z$row), enhanced_col = NumericList(z$col), enhanced_spatial.cluster = IntegerList(z$spatial.cluster))}))
master > zz
DataFrame with 96 rows and 3 columns
                   enhanced_row                   enhanced_col enhanced_spatial.cluster
                  <NumericList>                  <NumericList>            <IntegerList>
1   1.33333,1.33333,1.33333,... 1.333333,0.666667,1.000000,...                1,1,1,...
2   1.33333,1.33333,1.33333,...    2.33333,1.66667,2.00000,...                1,1,1,...
3   1.33333,1.33333,1.33333,...    3.33333,2.66667,3.00000,...                5,5,5,...
4   1.33333,1.33333,1.33333,...    4.33333,3.66667,4.00000,...                5,5,5,...
5   1.33333,1.33333,1.33333,...    5.33333,4.66667,5.00000,...                4,4,4,...
...                         ...                            ...                      ...
92  8.33333,8.33333,8.33333,...    8.33333,7.66667,8.00000,...                5,5,5,...
93  8.33333,8.33333,8.33333,...    9.33333,8.66667,9.00000,...                5,5,5,...
94  8.33333,8.33333,8.33333,... 10.33333, 9.66667,10.00000,...                3,3,3,...
95  8.33333,8.33333,8.33333,...    11.3333,10.6667,11.0000,...                6,6,6,...
96  8.33333,8.33333,8.33333,...    12.3333,11.6667,12.0000,...                1,1,1,...

Anyways, I'll likely try again but after re-arranging the code a bit to avoid the slow for() loop I had.

Best, Leo

edward130603 commented 2 years ago

Thanks for the feedback Leo. I'll try to implement this as an option in the next version.

edward130603 / BayesSpace

spatialEnhance() requires lots of memory: likely due to find_neighbors() vs .find_neighbors() #71

R session info