edward130603 / BayesSpace

Bayesian model for clustering and enhancing the resolution of spatial gene expression experiments.
http://edward130603.github.io/BayesSpace
Other
107 stars 22 forks source link

CIBERSORT approach #35

Closed astrid12345 closed 3 years ago

astrid12345 commented 3 years ago

Hi,

thank you for the development of this really exciting tool! It has been giving us great clustering results so far!! I was hoping to reproduce Supplemental Figure 16. CIBERSORT cell type proportions in invasive ductal carcinoma in subspots in your publication. Could you elaborate on data input and CIBERSORT settings? It would be helpful to have it included in the tutorial.

As sidequestion, we would be interested to extend BayesSpace to jointly cluster spots from multiple samples. (you hinted at it in your paper). We have three biological replicates on Visium but with considerable biological variation. Do you have suggestions how to combine and normalize the data?

Many thanks, Astrid

edward130603 commented 3 years ago

Hi Astrid, I've asked @msto to help with the CIBERSORT analysis.

For your second question, we are working on a vignette going through multi-sample analysis. In the meantime, I'm sharing some code here that might be helpful. I'm using the harmony method, which has been shown to be pretty good in benchmarks (at least in single cell), for batch correction between samples.

library(ggplot2)
library(Rtsne)
library(BayesSpace)

#This R script describes a way to do joint clustering for three samples.

#Download 3 samples
sce1 = getRDS("2020_maynard_prefrontal-cortex", "151673")
sce2 = getRDS("2020_maynard_prefrontal-cortex", "151674")
sce3 = getRDS("2020_maynard_prefrontal-cortex", "151675")
rowData(sce1)$is.HVG = NULL #deleting some rowData
rowData(sce2)$is.HVG = NULL #rowData needs to be same across samples
rowData(sce3)$is.HVG = NULL #HVGs will be recalculated

#Combine into 1 SCE
sce.combined = cbind(sce1, sce2, sce3, deparse.level = 1)
sce.combined = spatialPreprocess(sce.combined, n.PCs = 50) #lognormalize, PCA

tsne = Rtsne(reducedDim(sce.combined)) #use t-SNE to check for batch effect
colnames(tsne$Y) = c("TSNE1", "TSNE2")
reducedDim(sce.combined, "TSNE") = tsne$Y
#Do we have a batch effect? 
#If so, probably should do batch correction first
ggplot(data.frame(reducedDim(sce.combined, "TSNE")), 
       aes(x = TSNE1, y = TSNE2, color = factor(sce.combined$sample_name))) +
  geom_point() +
  labs(color = "Sample") +
  theme_bw()

#Batch correction with harmony
# install.packages("devtools")
devtools::install_github("immunogenomics/harmony")
library(harmony)
rescaled = RunHarmony(sce.combined, "sample_name")
tsne.corrected = Rtsne(reducedDim(rescaled, "HARMONY")) #check for batch effect again
colnames(tsne.corrected$Y) = c("TSNE1", "TSNE2")
reducedDim(sce.combined, "TSNE.corrected") = tsne.corrected$Y
#now no obvious batch effect
ggplot(data.frame(reducedDim(sce.combined, "TSNE.corrected")), 
       aes(x = TSNE1, y = TSNE2, color = factor(sce.combined$sample_name))) +
  geom_point() +
  labs(color = "Sample") +
  theme_bw()

#Make sure spots from different samples are not neighbors
#row ranges from 0 to 77 in my dataset so adding 100 will ensure spots from different samples are not neighbors
#if you want to plot them horizontally instead can just add to $col instead of $row
rescaled$row[rescaled$sample_name == "151674"] = 100 + rescaled$row[rescaled$sample_name == "151674"]
rescaled$row[rescaled$sample_name == "151675"] = 200 + rescaled$row[rescaled$sample_name == "151675"]
clusterPlot(rescaled, rescaled$layer_guess, color = NA) #plotting to make sure no overlap between samples

#Cluster
rescaled = spatialCluster(rescaled, use.dimred = "HARMONY", q = 7, nrep = 10000) #use HARMONY
clusterPlot(rescaled, color = NA) #plot clusters
mclust::adjustedRandIndex(rescaled$spatial.cluster, rescaled$layer_guess) #pretty good ARI
msto commented 3 years ago

Hi Astrid,

We used tidybulk to run CIBERSORT. We applied CIBERSORT to spots in the three immune-enriched clusters (1, 7, and 10), and only included expression from the top highly variable genes used elsewhere in the analysis (e.g. PCA).

See below for an example. The code assumes the spot-level data is stored in a SingleCelExperiment named sce and has already been clustered with BayesSpace.

# Format expression data for tidybulk
sce_expr <- logcounts(sce[, sce$spatial.cluster %in% c(1, 7, 10)])[HVGs, ] %>%
    as.data.frame() %>%
    tibble::rownames_to_column("Transcript") %>%
    pivot_longer(-Transcript, values_to="Logcount", names_to="Barcode") %>%
    dplyr::select(Barcode, Transcript, Logcount) %>%
    tidybulk(Barcode, Transcript, Logcount)

# Run CIBERSORT
sce.cibersort <- sce_expr %>% deconvolve_cellularity(action="get", cores=1)

# Add cibersort predictions back to sce's colData
sce.cd <- colData(sce) %>% 
    as.data.frame() %>%
    dplyr::left_join(sce.cibersort, by=c("spot"="Barcode"))

# Plot one cell type proportion
featurePlot(sce, sce.cd[["cibersort: T cells CD4 memory activated"]], color=NA)

# Plot total proportion of any T-cell subtypes
props <- sce.cd %>% dplyr::select(contains(" T cells")) %>% rowSums()
featurePlot(sce, props, color=NA)
astrid12345 commented 3 years ago

Hi Msto and Edward,

thank you both for sharing your code! That is very helpful. I'm excited to see how to Harmony approach will work :) I'm going to try out your examples and I'll close my issue asap,

Thanks, Astrid