Open lcolladotor opened 2 years ago
@msto do you recall why we used different functions for finding neighbors? Was it due to the non-integer coordinates for subspots?
@lcolladotor I'll look into this. But I think you might run into more memory issues (not to mention compute time) doing the enhancement on this large dataset. We'll try to optimize memory more, but in the meantime, it might be best to just run joint clustering at the spot level, and then single sample spatialEnhance using the joint cluster labels to initialize.
Thanks Edward! I'll try the strategy you suggest in the meantime =)
Best, Leo
Hi Edward,
I wasn't careful enough and messed up this analysis.
## The the spot level clustering
set.seed(20220201)
spe <- spatialCluster(spe, use.dimred = "HARMONY", q = k)
## Run spatialEnhance() one sample at a time
message("Running spatialEnhance() -- currently crashes due to https://github.com/edward130603/BayesSpace/issues/71")
Sys.time()
spe$imagerow <- spe$array_row
spe$imagecol <- spe$array_col
for (sample in unique(spe$sample_id)) {
message(Sys.time(), " processing sample ", sample)
set.seed(20220208)
spe_small <- spatialEnhance(spe[, spe$sample_id == sample], use.dimred = "HARMONY", q = k)
spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster
rm(spe_small)
}
Sys.time()
I should have looked more closely at https://edward130603.github.io/BayesSpace/reference/spatialEnhance.html and realized that the output dimensions would be incompatible. Since I didn't, I got lots of warnings like:
+Warning messages:
+1: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
+ number of items to replace is not a multiple of replacement length
It took quite a while to run too as you can see below.
Running spatialEnhance() -- currently crashes due to https://github.com/edward130603/BayesSpace/issues/71
[1] "2022-02-10 02:01:45 EST"
2022-02-10 02:01:46 processing sample V10A27106_A1_Br3874
Calculating labels using iterations 10000 through 2e+05.
2022-02-12 15:41:50 processing sample V10A27106_B1_Br3854
Calculating labels using iterations 10000 through 2e+05.
2022-02-14 07:58:43 processing sample V10A27106_C1_Br3873
Calculating labels using iterations 10000 through 2e+05.
2022-02-16 01:28:00 processing sample V10A27106_D1_Br3880
Calculating labels using iterations 10000 through 2e+05.
2022-02-18 21:54:58 processing sample V10T31036_A1_Br3874
Calculating labels using iterations 10000 through 2e+05.
2022-02-21 14:11:49 processing sample V10T31036_B1_Br3854
Calculating labels using iterations 10000 through 2e+05.
2022-02-23 20:36:02 processing sample V10T31036_C1_Br3873
Calculating labels using iterations 10000 through 2e+05.
2022-02-25 16:38:46 processing sample V10T31036_D1_Br3880
Calculating labels using iterations 10000 through 2e+05.
2022-02-28 15:29:46 processing sample V10A27004_A1_Br3874
Calculating labels using iterations 10000 through 2e+05.
2022-03-03 19:16:11 processing sample V10A27004_D1_Br3880
Calculating labels using iterations 10000 through 2e+05.
Warning messages:
1: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
number of items to replace is not a multiple of replacement length
2: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
number of items to replace is not a multiple of replacement length
3: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
number of items to replace is not a multiple of replacement length
4: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
number of items to replace is not a multiple of replacement length
5: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
number of items to replace is not a multiple of replacement length
6: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
number of items to replace is not a multiple of replacement length
7: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
number of items to replace is not a multiple of replacement length
8: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
number of items to replace is not a multiple of replacement length
9: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
number of items to replace is not a multiple of replacement length
10: In spe$spatial.cluster[spe$sample_id == sample] <- spe_small$spatial.cluster :
number of items to replace is not a multiple of replacement length
[1] "2022-03-06 17:51:54 EST"
I see now that with the example for spatialEnhance()
we have the following:
library("BayesSpace")
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#> dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#> grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#> rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
set.seed(149)
sce <- exampleSCE()
sce <- spatialCluster(sce, 7, nrep=100, burn.in=10)
#> Neighbors were identified for 96 out of 96 spots.
#> Fitting model...
#> Calculating labels using iterations 10 through 100.
enhanced <- spatialEnhance(sce, 7, nrep=100, burn.in=10)
#> Calculating labels using iterations 0 through 100.
colData(enhanced)
#> DataFrame with 864 rows and 9 columns
#> spot.idx subspot.idx spot.row spot.col row col
#> <numeric> <integer> <integer> <integer> <numeric> <numeric>
#> subspot_1.1 1 1 1 1 1.33333 1.33333
#> subspot_2.1 2 1 1 2 1.33333 2.33333
#> subspot_3.1 3 1 1 3 1.33333 3.33333
#> subspot_4.1 4 1 1 4 1.33333 4.33333
#> subspot_5.1 5 1 1 5 1.33333 5.33333
#> ... ... ... ... ... ... ...
#> subspot_92.9 92 9 8 8 8 8
#> subspot_93.9 93 9 8 9 8 9
#> subspot_94.9 94 9 8 10 8 10
#> subspot_95.9 95 9 8 11 8 11
#> subspot_96.9 96 9 8 12 8 12
#> imagerow imagecol spatial.cluster
#> <numeric> <numeric> <numeric>
#> subspot_1.1 1.33333 1.33333 1
#> subspot_2.1 1.33333 2.33333 1
#> subspot_3.1 1.33333 3.33333 5
#> subspot_4.1 1.33333 4.33333 5
#> subspot_5.1 1.33333 5.33333 4
#> ... ... ... ...
#> subspot_92.9 8 8 5
#> subspot_93.9 8 9 5
#> subspot_94.9 8 10 3
#> subspot_95.9 8 11 6
#> subspot_96.9 8 12 1
x <- colData(enhanced)
y <- split(x, x$spot.idx)
table(lengths(y))
#>
#> 9
#> 96
packageVersion("BayesSpace")
#> [1] '1.5.1'
Created on 2022-03-07 by the reprex package (v2.0.1)
I feel like returning something on the same dimensions might be easier though, like with:
> zz <- do.call(rbind, lapply(y, function(z) { DataFrame(enhanced_row = NumericList(z$row), enhanced_col = NumericList(z$col), enhanced_spatial.cluster = IntegerList(z$spatial.cluster))}))
master > zz
DataFrame with 96 rows and 3 columns
enhanced_row enhanced_col enhanced_spatial.cluster
<NumericList> <NumericList> <IntegerList>
1 1.33333,1.33333,1.33333,... 1.333333,0.666667,1.000000,... 1,1,1,...
2 1.33333,1.33333,1.33333,... 2.33333,1.66667,2.00000,... 1,1,1,...
3 1.33333,1.33333,1.33333,... 3.33333,2.66667,3.00000,... 5,5,5,...
4 1.33333,1.33333,1.33333,... 4.33333,3.66667,4.00000,... 5,5,5,...
5 1.33333,1.33333,1.33333,... 5.33333,4.66667,5.00000,... 4,4,4,...
... ... ... ...
92 8.33333,8.33333,8.33333,... 8.33333,7.66667,8.00000,... 5,5,5,...
93 8.33333,8.33333,8.33333,... 9.33333,8.66667,9.00000,... 5,5,5,...
94 8.33333,8.33333,8.33333,... 10.33333, 9.66667,10.00000,... 3,3,3,...
95 8.33333,8.33333,8.33333,... 11.3333,10.6667,11.0000,... 6,6,6,...
96 8.33333,8.33333,8.33333,... 12.3333,11.6667,12.0000,... 1,1,1,...
Anyways, I'll likely try again but after re-arranging the code a bit to avoid the slow for()
loop I had.
Best, Leo
Thanks for the feedback Leo. I'll try to implement this as an option in the next version.
Hi,
While using
BayesSpace
version 1.4.1, I'm runningspatialEnhance()
on a dataset with the following dimensions:with it, I get the following error:
and the following traceback:
Diving into the details, I see that
spatialCluster()
uses.find_neighbors()
as you can see at https://github.com/edward130603/BayesSpace/blob/9254efaa69958bf676e924e0369beca6e4aca993/R/spatialCluster.R#L144 whereasspatialEnchance()
usesfind_neighbors()
https://github.com/edward130603/BayesSpace/blob/cd2c155f457b3bf05986b2961fa8f4d2937769fe/R/spatialEnhance.R#L129. This then means that a large matrix is created at https://github.com/edward130603/BayesSpace/blob/cd2c155f457b3bf05986b2961fa8f4d2937769fe/R/utils.R#L18.Could this issue maybe be resolved by switching from
find_neighbors()
to.find_neighbors()
inspatialEnhance()
?R session info
Best, Leo