Closed AteeqMKhaliq closed 1 year ago
Hi @AteeqMKhaliq,
Thanks so much for using STdeconvolve
and sorry for the delay in my response!
If I understand you're question correctly, you are asking if it is reasonable to apply STdeconvolve
to all 9 datasets collated as one single dataset, or if it is better to apply STdeconvolve
to each dataset separately?
We actually analyzed datasets both ways in the STdeconvolve
paper and had success in each case. For example, the full MPOA dataset we used was a collection of multiple tissue slices (check out supplementary figure S1), which were combined into a single input dataset. It is important to note that these slices were all from the same animal with the same genes measured and so there was expected to be less technical variation between them. Some of our motivations for combining them were to ensure that the final dataset would be sufficiently large. We also expected some cell types to be present in certain slices but not others and wanted to assess STdeconvolve
's ability to detect them in the slices they were expected to be in. Conversely, we also tested STdeconvolve
independently on several completely separate samples of the mouse olfactory bulb (MOB). Here, we found high concordance between the deconvolved cell types of the different MOB datasets (check out supplementary figure S7), suggesting that STdeconvolve
produces reproducible results across different datasets of the same tissue.
My impression is that if there are multiple datasets from the same source with the same genes measured, then it is probably reasonable to combine them into a single dataset. However, if the samples are from different sources, it is probably better to process each separately. Nonetheless, cell types that are defined by distinct expression profiles should still be detected by STdeconvolve
in either case assuming that the associated genes are included in the input dataset.
I hope this helps and let me know if you have any additional questions, Brendan
Thanks, Brandan for the prompt reply. I really appreciate it. I tried analysing it by the method you pointed out, but I am not getting any output or results. I am sure i am doing something wrong here. Can you please help me, my object is one integrated spatial data set with 12 samples/images(Seurat).
ex.integrated
An object of class Seurat
38342 features across 32295 samples within 3 assays
Active assay: integrated (3000 features, 3000 variable features)
2 other assays present: Spatial, SCT
3 dimensional reductions calculated: pca, umap, tsne
12 images present: human_1, human_2.1, human_3.2, human_4.3, human_5.4, human_6.5, human_1.6, human_2.7, human_3.8, human_4.9, human_5.10, human_6.11
extract the counts matrix
mat <- Matrix((ex.integrated@assays$Spatial@counts), sparse = TRUE)
mat <- cleanCounts(mat, min.lib.size = 100)
filter for features in less than 100% of pixels but more than 5% of pixels
pdach1.mat <- restrictCorpus(mat, removeAbove = 1.0, removeBelow = 0.05)
pdf("human1.perplexity.pdf")
ldas = fitLDA(mat, Ks = seq(2, 9, by=1), plot=T, verbose=TRUE)
dev.off()
optLDA <- optimalModel(models = ldas, opt = "kneed")
results <- getBetaTheta(optLDA, perc.filt = 0.05, betaScale = 1000)
deconvolution resultsattach(frame)
deconProp =results$theta
write.table(deconProp, "All_results_decon.txt", quote=F)
deconGexp = results$beta
#human_1 (first sample)
sample1.pos <- GetTissueCoordinates(st@images$human_1)
colnames(sample1.pos) <- c('x','y')
Plotting scatterpies for 0 pixels with 15 cell-types...this could take a while if the dataset is large.
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (1): x0, y0, group, colour, fill and amount
Run `rlang::last_error()` to see where the error occurred.
It would be really helpful if you can please let me know how best I can approach this issue, Please note, that I am a cancer biologist. Best, AMK
Hi @AteeqMKhaliq,
Would you be able to provide all of the commands you tried running? What is the plotting command you are using? Also, if you could provide the dimensions of the counts matrix and positions matrix via dim(mat)
and dim(sample1.pos)
as well as the header of each via head(mat)
and head(sample1.pos)
that would also be useful. It seems that whatever you are using as the input theta
matrix is missing the corresponding rows for the spots. Maybe the spots of the mat
and the sample1.pos
matrices are named differently?
Hope this helps and keep me posted. Thanks! Brendan
Hi Brendan, Thank you for the quick response, I really appreciate it.
My Plotting command is as follows.
plt <- vizAllTopics(theta = deconProp,
pos = pos,
r = 3,
lwd = 0,
showLegend = TRUE,
plotTitle = NA) +
ggplot2::guides(fill=ggplot2::guide_legend(ncol=2)) +
## outer border
ggplot2::geom_rect(data = data.frame(pos),
ggplot2::aes(xmin = min(x)-90, xmax = max(x)+90,
ymin = min(y)-90, ymax = max(y)+90),
fill = NA, color = "black", linetype = "solid", size = 0.5) +
ggplot2::theme(
plot.background = ggplot2::element_blank()
) +
ggplot2::guides(colour = "none")
png("st.barcode_proportions.Human1.png", units="in", width=20, height=20, res=300)
plt
dev.off()
dim(mat)
17816 32295
dim(sample1.pos)
2961 2
head(mat)
6 x 32295 sparse Matrix of class "dgCMatrix"
[[ suppressing 2961 column names ‘AAACAAGTATCTCCCA-1’, ‘AAACAGAGCGACTCCT-1’, ‘AAACATTTCCCGGATT-1’ ... ]]
ISG15 3 . 2 . 1 . 2 1 . . . . 3 . . 3 . . 1 2 . 1 2 . 2 . 3 . . 1 1 . . .
ARHGEF16 3 . 1 . . 1 . . 3 . . 1 2 3 . 1 . . 1 2 . . 5 2 3 . . . 3 2 . 1 3 2
CLSTN1 7 2 1 . 1 . 1 . 7 4 . 2 1 6 . 3 3 2 1 1 2 1 5 . 4 1 2 . 5 . 1 1 . 2
MASP2 . . . . . . . . . . . . . . . . 1 . . . 1 . . . . . . . . . . . . .
DHRS3 4 2 4 . . . 1 3 9 . . 9 5 3 1 1 3 . . 8 . 2 6 . 4 . 2 . 4 . 2 6 6 11
EPHA2 12 1 2 . 4 4 1 2 3 4 . 4 3 5 1 1 1 1 1 3 . 2 11 1 2 2 5 . 2 4 1 3 3 4
head(sample1.pos)
x y
AAACAAGTATCTCCCA-1 424.3592 233.6445
AAACAGAGCGACTCCT-1 396.2485 449.5162
AAACATTTCCCGGATT-1 407.2483 167.5946
AAACCCGAACGAAATC-1 469.0715 263.7413
AAACCGGGTAGGTACC-1 169.1224 281.0558
AAACCGTTCGTCCAGG-1 217.5013 221.1679
Thanks a ton for your help, Best, AMK
Hi @AteeqMKhaliq,
Thanks for supplying your code and outputs. Looking at your code for plotting, I think you want to use the object called sample1.pos
as your pos
position matrix, but this is not what you are specifying in the function vizAllTopics()
. The input object pos
doesn't seem to have been defined yet. In vizAllTopics()
, try setting pos = sample1.pos
and see what happens.
Note that vizAllTopics()
only plots the spots (rows) that are in both input matrices so in both cases spots need to be the rows and share the same names between each matrix. (in your case, the deconProp
and pdach1.mat
objects).
If this doesn't work, then maybe either deconProp
is empty or perhaps the row.names
of deconProp
and sample1.pos
are different from some reason. If so, could you also provide the dim()
and head()
for pdach1.mat
and deconProp
? Is the mat
output you provided the mat
output from after running cleanCounts()
or before?
Hope this helps, Brendan
Thanks a lot for your prompt reply, and I am very sorry for bothering you. I found out where I was going wrong, I was using Ks(15) in the Fit LDA model and selected Opt LDA as 15.
ldas <- fitLDA(t(as.matrix(corpus)), Ks = c(15))
optLDA <- optimalModel(models = ldas, opt = 15)
Later on, I have run in a sequence from Ks 2-15 and kept opt=Min
ldas <- fitLDA(t(mat), Ks = seq(2, 15, by=1))
optLDA <- optimalModel(models = ldas, opt = "min")
This solved my problem, and in fact, now I can check Scatter Pies for individual images. Now I have deconvoluted the cell types (15 topics), I wanted to add this information to my surat object (metadata), I wanted to add which cell belongs to which topic, and how best I can achieve this. Can I use the top n genes for each deconvolved cell type based on their predicted gene expression distributions as a marker gene for cell type annotation? Sorry for this lengthy post, Thank you for all your help. Best, AMK
Hi @AteeqMKhaliq,
Glad you have it working!
Would you be able to clarify what you changed in your original code specifically to get it to work for you so that other users may be able to refer to it in the future?
My impression is that you wanted an LDA model that fit 15 cell types, but looking at your previous posts, it seems you only fitted models up to 9 cell types and picked the optimal model using the built-in "kneed" method:
ldas = fitLDA(mat, Ks = seq(2, 9, by=1), plot=T, verbose=TRUE)
optLDA <- optimalModel(models = ldas, opt = "kneed")
From here, you were able to obtain the spot x cell type proportions for plotting via:
results <- getBetaTheta(optLDA, perc.filt = 0.05, betaScale = 1000)
deconProp =results$theta
however, I don't see how the optLDA
could have been fitted to 15 cell types and thus unsure where the statement:
Plotting scatterpies for 0 pixels with 15 cell-types...this could take a while if the dataset is large.
would have come from (ie, how would the deconProp
matrix contain spot proportions for 15 cell types).
Would you be able to clarify this?
Thank you, Brendan
Hi Brandan, My observation , initially i used LDA model K=15 which ended up showing an error . Please have a look at the perplexity plot stNew.perplexityK15.pdf
Plotting scatterpies for 0 pixels with 15 cell-types...this could take a while if the dataset is large.
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (1): x0, y0, group, colour, fill and amount
Run `rlang::last_error()` to see where the error occurred.
later, itried to select the optimal K value using fitted using the built-in "kneed" method( K 2-15) which somehow resolved this issue, i am not sure what went into this, Please take a look into the plot.
stNew_pdach1.perplexity1.1.pdf
I am trying all possible ways to find celltypes from my ST data so that i can make some sense out of it. Please let me know if this is fine.
Regards, AMK
Hi Brandan,
Using STdeconvolve, I have estimated cell-type proportions of each spatial transcriptomics spot. Now, I am interested in similar clustering spots that are similar in cell composition. How best can I use this information to cluster to identify groups of spots in the different samples that shared similar cell-type compositions? Thanks AMK
Hi @AteeqMKhaliq,
Was each sample dataset analyzed by STdeconvolve
separately? If so, and if you are looking for spots with similar cell type compositions between them, then you will likely need to first assess how similar the deconvolved cell types are for each dataset. This could be done by performing a pair-wise correlation between the transcriptional profiles of the deconvolved cell types in dataset 1 vs dataset 2 for example. We provide functions to do this (check out an example here). With the correlation matrix, you could pair cell types from each dataset that are most similar (see the function lsatPairs()
, for example). Assuming that each cell type in a given pair represents the same cell type, you could then identify groups of spots in each dataset that have similar cell type proportions. If the total number of deconvolved cell types differs for each dataset, you could use the correlation matrix to assign more than one cell type from a dataset to a single cell type of another dataset if the the cell types in the former dataset both correlate strongly with the cell type in the latter. Once spots across datasets share the same features (i.e., cell types), you could then use a variety of different methods to identify spots with similar cell type proportions.
Hope this helps, Brendan
Going back to your previous post, I am still confused about how initially setting Ks=15
resulted in a deconProp
that had no spots visualized when combined with your position matrix. Would you be able to share all of your code in the order in which you ran it that led to the result with the error, and then the code that worked? Further, for each analysis, it would be useful to see the head()
and dim()
for the resulting deconProp
and whatever your position matrix is to ensure that the LDA model did return spot cell type proportions, and that the spot IDs are consistent between the proportion matrix and the position matrix.
Another comment about the perplexity plots: is there a reason why you chose to deconvolve 15 cell types? For the stNew_pdach1.perplexity1.1.pdf
plot, K=12 looks reasonable because it has the lowest perplexity before the number of low proportion cell types begins to increase. The increase in the number of low proportion cell types can serve as an upper bound on K because an increasing number of low proportion cell types could be an indicator that the model is oversplitting cell types in the dataset when a larger K is used.
Please reach out if you have more questions!
Thanks a lot for quick response, My codes are as follows, I have combined all samples and integrated it as one object in Seurat. My codes are as follows,
library(STdeconvolve)
library(Matrix)
extract the counts matrix
mat <- Matrix(GetAssayData(object = pdac.integrated, slot = "counts", assay="Spatial"), sparse = TRUE)
mat <- cleanCounts(mat, min.lib.size = 100)
mat <- restrictCorpus(mat, removeAbove = 1.0, removeBelow = 0.05)
ldas <- fitLDA(t(mat), Ks = seq(2, 15, by=1))
optLDA <- optimalModel(models = ldas, opt = "min")
results <- getBetaTheta(optLDA, perc.filt = 0.05, betaScale = 1000)
deconProp <- results$theta
deconGexp <- results$beta
pos <- GetTissueCoordinates(Seu.obj, image="image_4.3")
colnames(pos) <- c('x','y')
plt <- vizAllTopics(theta = deconProp,
pos = pos,
r = 3,
lwd = 0,
showLegend = TRUE,
plotTitle = NA) +
ggplot2::guides(fill=ggplot2::guide_legend(ncol=2)) +
## outer border
ggplot2::geom_rect(data = data.frame(pos),
ggplot2::aes(xmin = min(x)-90, xmax = max(x)+90,
ymin = min(y)-90, ymax = max(y)+90),
fill = NA, color = "black", linetype = "solid", size = 0.5) +
ggplot2::theme(
plot.background = ggplot2::element_blank()
) +
ggplot2::guides(colour = "none")
png("st.barcode_proportions.p4.png", units="in", width=10, height=10, res=300)
plt
dev.off()
Hi, Thanks for developing reference free Deconvolve method, I thoroughly enjoy running it on the example data. Going forward, I have an integrated spatial Seurat object, we have 9 spatial samples(images ) as one object. My question is how best I can use this method on our dataset. Should I run STdeconvolve on individual samples or can I run this together? My codes are as follows.
Please let me know how best I can approach this issue.
Thank you for your help. Regards, AK