LieberInstitute / spatialDLPFC

spatialDLPFC project involving Visium (n = 30), Visium SPG (n = 4) and snRNA-seq (n = 19) samples
http://research.libd.org/spatialDLPFC/
16 stars 3 forks source link

Compare spot-deconvolution vs IF cell counts #99

Closed lcolladotor closed 1 year ago

lcolladotor commented 2 years ago

This will be done only with the n = 4 Visium IF samples. It will depend on results from several other issues https://github.com/LieberInstitute/spatialDLPFC/issues?q=is%3Aissue+is%3Aopen+label%3Aspot-deconvolution

We still need to think about how exactly this comparison will be done and the main figure resulting from it (which will be a panel in one of the main figures of the paper) plus potentially some supplementary figures

lcolladotor commented 2 years ago

Louise and I could help with ideas, but now @Nick-Eagles will be doing this.

lcolladotor commented 2 years ago

One option is to make scatterplots comparing in the x-axis the proportion observed in the IF data vs the proportion estimated from the spot deconvolution results (from #128). That would be like https://speakerdeck.com/lcolladotor/psychgenomics-2022?slide=36 or the slide after it. However, with only 4 points (4 Visium IF samples) per cell type for a particular deconvolution method, it's not a lot to see whether things are "closer to the diagonal" or not.

We could compute a RMSE (root mean squared error) between the observed (spot deconvolution results) vs the expected (proportions from the IF part). Though well, that RSME again would be based data from 4 points from the scatterplot above.

At the spot level, we can check whether the proportion from the broad cell type resolution seems to match the proportion from the layer-level resolution (combining the results from the different Excit_Lxx results). That would be scatterplots paneled by sample (since we have 4) with lots of points (since we have up to about 4k or so spots per sample). That evaluates just the consistency of the results and would be similar to the comparison at https://speakerdeck.com/lcolladotor/psychgenomics-2022?slide=36 when we changed the number of marker genes.

Spatially, we could know from the H&E staining where the GM vs WM boundary is as well as the orientation of L1 through L6, so for each method we could plot the number of cells (one cell type at a time) spatially and see if the spatial pattern is "better" for one vs another deconvolution method. We can do that with http://research.libd.org/spatialLIBD/reference/vis_grid_gene.html.

We likely need to think more about this.

Summary:

lcolladotor commented 2 years ago

IMG_8505

We talked about how for cell2location and Tangram we want to compare the total counts (or total abundance) vs the IF total counts, doing a scatterplot at the spot level and paneling by the 4 Visium-IF samples.

Then we'll also make scatterplots for each cell type, paneled by sample, for the count and the proportion.

lcolladotor commented 2 years ago

Per cell type, scatterplot of the mean/median expression for the up to 25 mean ratio marker genes vs the proportion or count for that cell type; paneled by sample. Annotate with correlation.

We would expect to see a positive association between the 2 variables.

Could color each spot by the RSME vs IF.

lcolladotor commented 2 years ago

vis_grid_gene() plot of the RSMEs

Nick-Eagles commented 1 year ago

I've produced these plots and others that we've discussed here and here. Closing because although we may continue to tweak plots slightly, all the fundamental IF plots should be complete now.