JEFworks-Lab / STdeconvolve

Reference-free cell-type deconvolution of multi-cellular spatially resolved transcriptomics data
http://jef.works/STdeconvolve/
98 stars 12 forks source link

integrated Object #21

Closed AteeqMKhaliq closed 1 year ago

AteeqMKhaliq commented 1 year ago

Hi, Thanks for developing reference free Deconvolve method, I thoroughly enjoy running it on the example data. Going forward, I have an integrated spatial Seurat object, we have 9 spatial samples(images ) as one object. My question is how best I can use this method on our dataset. Should I run STdeconvolve on individual samples or can I run this together? My codes are as follows.


` extract the counts matrix

mat <- Matrix((st@assays$Spatial@counts), sparse = TRUE)

remove poor genes and pixels
mat <- cleanCounts(mat, min.lib.size = 100)
filter for features in less than 100% of pixels but more than 5% of pixels
pdach1.mat <- restrictCorpus(mat, removeAbove = 1.0, removeBelow = 0.05)

pdf("pdach1.perplexity.pdf")
ldas = fitLDA(mat, Ks = seq(2, 9, by=1), plot=T, verbose=TRUE)
dev.off()

optLDA <- optimalModel(models = ldas, opt = "kneed")

results <- getBetaTheta(optLDA, perc.filt = 0.05, betaScale = 1000)
deconvolution resultsattach(frame)
deconProp =results$theta
write.table(deconProp, "All_results_decon.txt", quote=F)

 predicted transcriptional profiles of the deconvolved cell-types as the beta matrix
deconGexp = results$beta

#SAMPLE 1
sample1.pos <- GetTissueCoordinates(st@images$sample1)

 change column names to x and y
colnames(sample1.pos) <- c('x','y')
`

Please let me know how best I can approach this issue.

Thank you for your help. Regards, AK

bmill3r commented 1 year ago

Hi @AteeqMKhaliq,

Thanks so much for using STdeconvolve and sorry for the delay in my response!

If I understand you're question correctly, you are asking if it is reasonable to apply STdeconvolve to all 9 datasets collated as one single dataset, or if it is better to apply STdeconvolve to each dataset separately?

We actually analyzed datasets both ways in the STdeconvolve paper and had success in each case. For example, the full MPOA dataset we used was a collection of multiple tissue slices (check out supplementary figure S1), which were combined into a single input dataset. It is important to note that these slices were all from the same animal with the same genes measured and so there was expected to be less technical variation between them. Some of our motivations for combining them were to ensure that the final dataset would be sufficiently large. We also expected some cell types to be present in certain slices but not others and wanted to assess STdeconvolve's ability to detect them in the slices they were expected to be in. Conversely, we also tested STdeconvolve independently on several completely separate samples of the mouse olfactory bulb (MOB). Here, we found high concordance between the deconvolved cell types of the different MOB datasets (check out supplementary figure S7), suggesting that STdeconvolve produces reproducible results across different datasets of the same tissue.

My impression is that if there are multiple datasets from the same source with the same genes measured, then it is probably reasonable to combine them into a single dataset. However, if the samples are from different sources, it is probably better to process each separately. Nonetheless, cell types that are defined by distinct expression profiles should still be detected by STdeconvolve in either case assuming that the associated genes are included in the input dataset.

I hope this helps and let me know if you have any additional questions, Brendan

AteeqMKhaliq commented 1 year ago

Thanks, Brandan for the prompt reply. I really appreciate it. I tried analysing it by the method you pointed out, but I am not getting any output or results. I am sure i am doing something wrong here. Can you please help me, my object is one integrated spatial data set with 12 samples/images(Seurat).

ex.integrated
An object of class Seurat 
38342 features across 32295 samples within 3 assays 
Active assay: integrated (3000 features, 3000 variable features)
 2 other assays present: Spatial, SCT
 3 dimensional reductions calculated: pca, umap, tsne
 12 images present: human_1, human_2.1, human_3.2, human_4.3, human_5.4, human_6.5, human_1.6, human_2.7, human_3.8, human_4.9, human_5.10, human_6.11

extract the counts matrix

mat <- Matrix((ex.integrated@assays$Spatial@counts), sparse = TRUE)
mat <- cleanCounts(mat, min.lib.size = 100)
filter for features in less than 100% of pixels but more than 5% of pixels
pdach1.mat <- restrictCorpus(mat, removeAbove = 1.0, removeBelow = 0.05)

pdf("human1.perplexity.pdf")
ldas = fitLDA(mat, Ks = seq(2, 9, by=1), plot=T, verbose=TRUE)
dev.off()

optLDA <- optimalModel(models = ldas, opt = "kneed")

results <- getBetaTheta(optLDA, perc.filt = 0.05, betaScale = 1000)
deconvolution resultsattach(frame)
deconProp =results$theta
write.table(deconProp, "All_results_decon.txt", quote=F)
deconGexp = results$beta

#human_1 (first sample)
sample1.pos <- GetTissueCoordinates(st@images$human_1)
colnames(sample1.pos) <- c('x','y')

Plotting scatterpies for 0 pixels with 15 cell-types...this could take a while if the dataset is large.

Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (1): x0, y0, group, colour, fill and amount
Run `rlang::last_error()` to see where the error occurred.

It would be really helpful if you can please let me know how best I can approach this issue, Please note, that I am a cancer biologist. Best, AMK

bmill3r commented 1 year ago

Hi @AteeqMKhaliq,

Would you be able to provide all of the commands you tried running? What is the plotting command you are using? Also, if you could provide the dimensions of the counts matrix and positions matrix via dim(mat) and dim(sample1.pos) as well as the header of each via head(mat) and head(sample1.pos) that would also be useful. It seems that whatever you are using as the input theta matrix is missing the corresponding rows for the spots. Maybe the spots of the mat and the sample1.pos matrices are named differently?

Hope this helps and keep me posted. Thanks! Brendan

AteeqMKhaliq commented 1 year ago

Hi Brendan, Thank you for the quick response, I really appreciate it.

My Plotting command is as follows.


plt <- vizAllTopics(theta = deconProp,
                   pos = pos,
                   r = 3,
                   lwd = 0,
                   showLegend = TRUE,
                   plotTitle = NA) +
  ggplot2::guides(fill=ggplot2::guide_legend(ncol=2)) +

  ## outer border
  ggplot2::geom_rect(data = data.frame(pos),
            ggplot2::aes(xmin = min(x)-90, xmax = max(x)+90,
                         ymin = min(y)-90, ymax = max(y)+90),
            fill = NA, color = "black", linetype = "solid", size = 0.5) +

  ggplot2::theme(
    plot.background = ggplot2::element_blank()
  ) +
  ggplot2::guides(colour = "none")

png("st.barcode_proportions.Human1.png", units="in", width=20, height=20, res=300)
plt
dev.off()
dim(mat)
17816 32295
dim(sample1.pos)
2961    2

head(mat)
6 x 32295 sparse Matrix of class "dgCMatrix"
   [[ suppressing 2961 column names ‘AAACAAGTATCTCCCA-1’, ‘AAACAGAGCGACTCCT-1’, ‘AAACATTTCCCGGATT-1’ ... ]]

ISG15     3 . 2 . 1 . 2 1 . . . . 3 . . 3 . . 1 2 . 1  2 . 2 . 3 . . 1 1 . .  .
ARHGEF16  3 . 1 . . 1 . . 3 . . 1 2 3 . 1 . . 1 2 . .  5 2 3 . . . 3 2 . 1 3  2
CLSTN1    7 2 1 . 1 . 1 . 7 4 . 2 1 6 . 3 3 2 1 1 2 1  5 . 4 1 2 . 5 . 1 1 .  2
MASP2     . . . . . . . . . . . . . . . . 1 . . . 1 .  . . . . . . . . . . .  .
DHRS3     4 2 4 . . . 1 3 9 . . 9 5 3 1 1 3 . . 8 . 2  6 . 4 . 2 . 4 . 2 6 6 11
EPHA2    12 1 2 . 4 4 1 2 3 4 . 4 3 5 1 1 1 1 1 3 . 2 11 1 2 2 5 . 2 4 1 3 3  4

head(sample1.pos)
                          x        y
AAACAAGTATCTCCCA-1 424.3592 233.6445
AAACAGAGCGACTCCT-1 396.2485 449.5162
AAACATTTCCCGGATT-1 407.2483 167.5946
AAACCCGAACGAAATC-1 469.0715 263.7413
AAACCGGGTAGGTACC-1 169.1224 281.0558
AAACCGTTCGTCCAGG-1 217.5013 221.1679

Thanks a ton for your help, Best, AMK

bmill3r commented 1 year ago

Hi @AteeqMKhaliq,

Thanks for supplying your code and outputs. Looking at your code for plotting, I think you want to use the object called sample1.pos as your pos position matrix, but this is not what you are specifying in the function vizAllTopics(). The input object pos doesn't seem to have been defined yet. In vizAllTopics(), try setting pos = sample1.pos and see what happens.

Note that vizAllTopics() only plots the spots (rows) that are in both input matrices so in both cases spots need to be the rows and share the same names between each matrix. (in your case, the deconProp and pdach1.mat objects).

If this doesn't work, then maybe either deconProp is empty or perhaps the row.names of deconProp and sample1.pos are different from some reason. If so, could you also provide the dim() and head() for pdach1.mat and deconProp? Is the mat output you provided the mat output from after running cleanCounts() or before?

Hope this helps, Brendan

AteeqMKhaliq commented 1 year ago

Thanks a lot for your prompt reply, and I am very sorry for bothering you. I found out where I was going wrong, I was using Ks(15) in the Fit LDA model and selected Opt LDA as 15.

ldas <- fitLDA(t(as.matrix(corpus)), Ks = c(15)) 
optLDA <- optimalModel(models = ldas, opt = 15)

Later on, I have run in a sequence from Ks 2-15 and kept opt=Min

ldas <- fitLDA(t(mat), Ks = seq(2, 15, by=1))
optLDA <- optimalModel(models = ldas, opt = "min")

This solved my problem, and in fact, now I can check Scatter Pies for individual images. Now I have deconvoluted the cell types (15 topics), I wanted to add this information to my surat object (metadata), I wanted to add which cell belongs to which topic, and how best I can achieve this. Can I use the top n genes for each deconvolved cell type based on their predicted gene expression distributions as a marker gene for cell type annotation? Sorry for this lengthy post, Thank you for all your help. Best, AMK

bmill3r commented 1 year ago

Hi @AteeqMKhaliq,

Glad you have it working!

Would you be able to clarify what you changed in your original code specifically to get it to work for you so that other users may be able to refer to it in the future?

My impression is that you wanted an LDA model that fit 15 cell types, but looking at your previous posts, it seems you only fitted models up to 9 cell types and picked the optimal model using the built-in "kneed" method:

ldas = fitLDA(mat, Ks = seq(2, 9, by=1), plot=T, verbose=TRUE)
optLDA <- optimalModel(models = ldas, opt = "kneed")

From here, you were able to obtain the spot x cell type proportions for plotting via:

results <- getBetaTheta(optLDA, perc.filt = 0.05, betaScale = 1000)
deconProp =results$theta

however, I don't see how the optLDA could have been fitted to 15 cell types and thus unsure where the statement: Plotting scatterpies for 0 pixels with 15 cell-types...this could take a while if the dataset is large. would have come from (ie, how would the deconProp matrix contain spot proportions for 15 cell types).

Would you be able to clarify this?

Thank you, Brendan

AteeqMKhaliq commented 1 year ago

Hi Brandan, My observation , initially i used LDA model K=15 which ended up showing an error . Please have a look at the perplexity plot stNew.perplexityK15.pdf


Plotting scatterpies for 0 pixels with 15 cell-types...this could take a while if the dataset is large.

Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (1): x0, y0, group, colour, fill and amount
Run `rlang::last_error()` to see where the error occurred.

later, itried to select the optimal K value using fitted using the built-in "kneed" method( K 2-15) which somehow resolved this issue, i am not sure what went into this, Please take a look into the plot.

stNew_pdach1.perplexity1.1.pdf

I am trying all possible ways to find celltypes from my ST data so that i can make some sense out of it. Please let me know if this is fine.

Regards, AMK

AteeqMKhaliq commented 1 year ago

Hi Brandan,

Using STdeconvolve, I have estimated cell-type proportions of each spatial transcriptomics spot. Now, I am interested in similar clustering spots that are similar in cell composition. How best can I use this information to cluster to identify groups of spots in the different samples that shared similar cell-type compositions? Thanks AMK

bmill3r commented 1 year ago

Hi @AteeqMKhaliq,

Was each sample dataset analyzed by STdeconvolve separately? If so, and if you are looking for spots with similar cell type compositions between them, then you will likely need to first assess how similar the deconvolved cell types are for each dataset. This could be done by performing a pair-wise correlation between the transcriptional profiles of the deconvolved cell types in dataset 1 vs dataset 2 for example. We provide functions to do this (check out an example here). With the correlation matrix, you could pair cell types from each dataset that are most similar (see the function lsatPairs(), for example). Assuming that each cell type in a given pair represents the same cell type, you could then identify groups of spots in each dataset that have similar cell type proportions. If the total number of deconvolved cell types differs for each dataset, you could use the correlation matrix to assign more than one cell type from a dataset to a single cell type of another dataset if the the cell types in the former dataset both correlate strongly with the cell type in the latter. Once spots across datasets share the same features (i.e., cell types), you could then use a variety of different methods to identify spots with similar cell type proportions.

Hope this helps, Brendan

Going back to your previous post, I am still confused about how initially setting Ks=15 resulted in a deconProp that had no spots visualized when combined with your position matrix. Would you be able to share all of your code in the order in which you ran it that led to the result with the error, and then the code that worked? Further, for each analysis, it would be useful to see the head() and dim() for the resulting deconProp and whatever your position matrix is to ensure that the LDA model did return spot cell type proportions, and that the spot IDs are consistent between the proportion matrix and the position matrix.

Another comment about the perplexity plots: is there a reason why you chose to deconvolve 15 cell types? For the stNew_pdach1.perplexity1.1.pdf plot, K=12 looks reasonable because it has the lowest perplexity before the number of low proportion cell types begins to increase. The increase in the number of low proportion cell types can serve as an upper bound on K because an increasing number of low proportion cell types could be an indicator that the model is oversplitting cell types in the dataset when a larger K is used.

Please reach out if you have more questions!

AteeqMKhaliq commented 1 year ago

Thanks a lot for quick response, My codes are as follows, I have combined all samples and integrated it as one object in Seurat. My codes are as follows,


library(STdeconvolve)
library(Matrix)
extract the counts matrix

mat <- Matrix(GetAssayData(object = pdac.integrated, slot = "counts", assay="Spatial"), sparse = TRUE)
mat <- cleanCounts(mat, min.lib.size = 100)
mat <- restrictCorpus(mat, removeAbove = 1.0, removeBelow = 0.05)

ldas <- fitLDA(t(mat), Ks = seq(2, 15, by=1))

optLDA <- optimalModel(models = ldas, opt = "min")

results <- getBetaTheta(optLDA, perc.filt = 0.05, betaScale = 1000)

deconProp <- results$theta
deconGexp <- results$beta

pos <- GetTissueCoordinates(Seu.obj, image="image_4.3")
colnames(pos) <- c('x','y')

plt <- vizAllTopics(theta = deconProp,
                   pos = pos,
                   r = 3,
                   lwd = 0,
                   showLegend = TRUE,
                   plotTitle = NA) +
  ggplot2::guides(fill=ggplot2::guide_legend(ncol=2)) +

  ## outer border
  ggplot2::geom_rect(data = data.frame(pos),
            ggplot2::aes(xmin = min(x)-90, xmax = max(x)+90,
                         ymin = min(y)-90, ymax = max(y)+90),
            fill = NA, color = "black", linetype = "solid", size = 0.5) +

  ggplot2::theme(
    plot.background = ggplot2::element_blank()
  ) +

   ggplot2::guides(colour = "none")

png("st.barcode_proportions.p4.png", units="in", width=10, height=10, res=300)
plt
dev.off()