HelenaLC / muscat

Multi-sample multi-group scRNA-seq analysis tools
165 stars 33 forks source link

Analysing DEG in Spatial visium data #103

Open vkjain0006 opened 2 years ago

vkjain0006 commented 2 years ago

Hi all,

Just need to understand if using pseudobulk approach of analyzing spatial data a right approach ? Well like single cell, data is available as spot by gene matrix and there's count data available. .

I am working on project where we have aprox. 30 sample each for 3 conditions (control, treatment1 and treatment2) and to look for differentially expressed genes based I ran muscat following basic commands as suggested. I do get DEGs but they are not much like 1/2 or 0 per cluster. I am not sure if I am doing something wrong or this is not the right approch for spatial data.

Below is the code I used :

converting seurat object to single cell experimnet format

DefaultAssay(sp_data) <- "Spatial" spdata_sce <- as.SingleCellExperiment(x = sp_data, assay = "Spatial")

initial defining of required variables

(spdata_sce <- prepSCE(spdata_sce, kid = "layers", # subpopulation assignments gid = "primary", # group IDs (ctrl/stim) sid = "sample", # sample IDs (ctrl/stim.1234) drop = TRUE)) # drop all other colData columns

nk <- length(kids <- levels(spdata_sce$cluster_id)) ns <- length(sids <- levels(spdata_sce$sample_id)) names(kids) <- kids; names(sids) <- sids

nb. of cells per cluster-sample

write.csv(t(table(spdata_sce$cluster_id, spdata_sce$sample_id)),file = "cluster-sample_info.csv") t(table(spdata_sce$cluster_id, spdata_sce$sample_id)) #

construct design & contrast matrix

ei <- metadata(spdata_sce)$experiment_info ei <- ei[order(ei$group_id),] mm <- model.matrix(~ 0 + ei$group_id) dimnames(mm) <- list(ei$sample_id, levels(ei$group_id)) contrast <- makeContrasts("Treatment1-Control", levels = mm)

pb <- aggregateData(spdata_sce, assay = "counts", fun = "sum", by = c("cluster_id", "sample_id"))

one sheet per subpopulation

assayNames(pb) t(head(assay(pb)))

run DS analysis

res <- pbDS(pb, design = mm, contrast = contrast, method = "edgeR")

After this when I check for sig DEGs per cluster there are hardly any - 1, 2 or 0 genes show up as sig DEGs between conditions per cluster. Can someone please suggest the best possible approch for spatial data ?

Thanks

HelenaLC commented 2 years ago

Hm. Definitely an interesting thing to ponder on; whether or not the method is applicable here and, if not, why not, isn't straight forward I'd say. I guess the key issue here is that Visium spots != single cells. With the current resolution, we'd expect ~10 cells on average per spot. These needn't come from the same cell subpopulation. So typically one would apply deconvolution to estimate the proportion of subpopulations per spot. I assume that at the moment you have annotated spots into clusters that may correspond to different layers / subpopulation compositions? I'd expect that subpopulations making up these layers exhibit different expression patterns, and needn't necessarily respond coherently to a given treatment. But it's really hard to tell. The main point remaining: Spots are not cells.

vkjain0006 commented 2 years ago

Thank you so much Helena for your quick reply. Yes that's right Visium spots are not equal to single cell and we are working on finding cell type proportions in each spot using available deconvolution methods. For specifically this comparison we are looking at spots classified as 1 particular type (for example layer1). Then looking for DEGs between treatment vs control in layer1.

Would direct cell level based analysis be a logical method to give it a try to in terms of finding DEGs between treatment and control for spots in layer1.