Nanostring-Biostats / SpatialDecon

The SpatialDecon library implements the SpatialDecon algorithm for mixed cell deconvolution in spatial gene expression datasets. (This algorithm also works in bulk expression profiling data.)
MIT License
33 stars 8 forks source link

Unexpected uniform florets #31

Closed oligomyeggo closed 2 years ago

oligomyeggo commented 2 years ago

Hello, and thank you for providing such a nice deconvolution package for spatial data! I have been following your vignette with my Visium data, and am getting an unexpected outcome. Rather than having varied and heterogenous florets for each spot, I instead have nearly identical florets at each barcoded spot. When I look at the matrix of estimated cell abundances (beta) there is some slight variation:

                              CTATTTGGTTACGGAT-1_3 GAGCTAAGGGCATATC-1_3 GCCCGTAATACCTTCT-1_3 GCGCTTAAATAATTGG-1_3
Dopaminergic.neuron                     0.07062816           0.06638519            0.1083782            0.1253812
Ependymal.cell                          0.64877359           0.64027948            0.6321957            0.6498545
Erythroblast.Hbb.bh1.high               0.60865675           0.58878859            0.5959594            0.6453591

However, I believe that this is not biologically accurate and that I should be seeing varying florets with different cell mixtures depending on where a spot is in the tissue. I am wondering if perhaps I missed something in the vignette/if my parameters are inaccurate.

I used the provided Mouse, Fetal/E14.5, Brain_MCA matrix using the download_profile_matrix() function. I confirmed that I had a sufficient number of overlapping genes between that matrix and my Seurat object (13,000+ shared genes). For the runspatialdecon() function, I used bg = 0.01, and tried variations of including wts and align_genes = TRUE, but I always got the same homogenous florets. I even tried using a custom reference dataset, and had the same issue.

I've attached an example image of the florets I am seeing. Any advice would be greatly appreciated!

example_image

maddygriz commented 2 years ago

Hello, This certainly is an unexpected outcome. A couple of questions: 1) have you tried running the data in the vignette and do you get the vignette figure or a homogeneous floret plot? If you get a homogeneous plot there, I think it might be an environment issue. If so, can you attach your session_info()? 2) Have you tried the TIL_barplot function? Does that show the same uniformity across spots? If so, we know that the plotting isn't the issue. 3) It seems like there is one cell type that is driving the florets at 12 o'clock. Is that cell type beta much larger than the others? The small differences might be hard to differentiate if that cell type is driving the range up.

oligomyeggo commented 2 years ago

Hi @maddygriz, thanks so much for your quick response!

  1. I ran through the data in the vignette and was able to replicate all the vignette figures, so hopefully that rules out an environment issue but I have attached my session_info() just in case.
  2. I just tried the TIL_barplot() function, and it does show the same uniformity across spots (see attached plots). The spots are uniform, though not identical. I am guessing this rules out the plotting as the issue.
  3. There is one cell type that has a larger beta than the other cell types, though I am not sure if it is that much larger (I am not sure what the range of beta values tends to be). The largest beta is ~3.5, while the next largest beta is ~2.3, and the smallest is ~0.08.

TIL_barplot_legend TIL_barplot sessionInfo.txt

maddygriz commented 2 years ago

It looks like this is a profile matrix issue on our end. We did not catch that almost 90% of the values are below 1. Since the values are super close together, it makes the deconvolution nearly impossible. I would suggest using a different profile matrix while we work to either fix that profile or find a new fetal dataset. You can create a custom profile matrix on any single cell dataset using the create_profile_matrix() function. Sorry for the inconvenience.

oligomyeggo commented 2 years ago

I have tried creating a custom profile matrix using two different scRNA-seq datasets and I am still having the same issue unfortunately. So, I am not sure if it is necessarily your particular profile matrix since I have seen the same pattern using completely different reference data.

maddygriz commented 2 years ago

Good to know. We just ran a dataset with that profile and also did not get a homogeneous figure. So if we have ruled out the matrix that leaves us with either the data or the actual algorithm.

Can you compare the standard deviation of your data before and after SpatialDecon? before: summary(apply(GetAssayData(object = object, slot = "counts"), 1, sd)) after: summary(apply(deconResults$beta, 1, sd))

oligomyeggo commented 2 years ago

Sure thing!

The standard deviation of my data before SpatialDecon:

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
  0.00000   0.07981   0.27999   0.41078   0.52363 114.42326 

And after:

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.004724 0.012359 0.016855 0.019853 0.026768 0.048620 
maddygriz commented 2 years ago

Alright. The before SpatialDecon variation looks to be in line with our demo dataset variation so I don't think that is causing the issue. I have one more test that I want to run before pulling in the big biostatistician guns.

I'm wondering if the genes in your dataset that are more highly variable don't match the profile matrix or at least the ones that make each profile unique.

Can you run this code and see if the generated heatmap is also homogenous? spatialDecon_homogenous_Rscript.txt

oligomyeggo commented 2 years ago

After running the code you provided, it does look like the heatmap is fairly homogenous across cell types. Would this indicate that the highly variable genes in my data set don't match the profile matrix (and also unforunately don't match the custom profile matrices I tried out either)?

heatmap_test

maddygriz commented 2 years ago

This heatmap indicates that the variable genes in your dataset match the genes in the profile that are stable across cell types making it hard to differentiate between them. To solidify this thought process we can also look at this heatmap to see if any obvious profiles are able to be extracted from the previous homogenous-looking heatmap.

The highSDprofile is the same matrix from the previous script. heatmap(sweep(highSDprofile, 1, apply(highSDprofile, 1, max), "/"), labRow = NA, margins = c(10, 5))

If this heatmap is homogeneous, this shows that the dataset does not contain enough or the correct variable genes to deconvolute the cell types. I have not worked with Visium data enough to say if that is an experiment issue or if using a different profile matrix will solve the issue.

oligomyeggo commented 2 years ago

I believe the heatmap generated from running the above code looks a bit homogeneous as well?

heatmap_test2

maddygriz commented 2 years ago

I would agree that the heatmap is pretty homogeneous. There are a couple of dark red portions but not enough to really take advantage of SpatialDecon unfortunately.

oligomyeggo commented 2 years ago

That's what I figured. Oh well. Hopefully I will be working on some GeoMx data in the near future. Thank you so much for all of your help; I really appreciate it!