Teichlab / SpatialDE

Test genes for Spatial Variation
MIT License
144 stars 54 forks source link

adapting the method for higher dimensions #24

Closed j-bac closed 2 years ago

j-bac commented 2 years ago

thanks for the package!

We have here adapted GP models to spatial transcriptome data, although the model can also be applied to univariate data (Supp. Fig. 14) or higher-dimensional inputs I'm interested in this, is it straightforward to adapt the code to detect highly variable genes with localized patterns given only scRNAseq data ? i.e., with neighborhood defined from a kNN graph ?

vals commented 2 years ago

Hi!

That's an interesting idea. I'm not so familiar with it, but there are people who work on Gaussian Processes defined over graphs. What you need is a covariance function that takes two graph nodes and returns the covariance between values in those nodes.

How to implement the graph covariance I'm not completely sure, but the factorization trick we use here to speed up computation should still work. It works directly on the covariance matrix and doesn't care how that covariance matrix was constructed.

j-bac commented 2 years ago

thanks, I'll look into it! Also the link to download the MouseOB data seems broken, do you still have it by any chance ?

vals commented 2 years ago

Hi,

Let me know if you make progress!

Yes sorry about that. Git LFS ended up causing a lot of problems. I put a version of repo with all the data in it here: https://figshare.com/articles/software/SpatialDE/17065217

I'll add the link to the README. Thansk for the reminder!

j-bac commented 2 years ago

Thanks! I appreciate how well organized the repo is.

Actually I realized the method already works on higher-dimensional inputs without modification, I think it's a cool usage to detect "interesting" highly variable genes that show pattern and are not just distributed in a random pattern. It's useful even for a scRNAseq assay without true spatial coordinates

I think generalizing this to graphs is still a nice project - I found implementation of graph GP and I'm playing with it. E.g. it would be cool if we can apply this to compare scATAC/scRNA graphs and detect multi-modal gene clusters with similar patterns in both assays

vals commented 2 years ago

Oh yeah the Euclidean distance calculation already works for any dimension.

One reason people prefer to use graphs in high-dimensional space is that you get this unintuitive issue of space getting "more empty" as dimensionality increases. Then, since the GP pretty much interpolates between observed points, this means for higher dimensions it is harder to reject the null hypothesis that noise is uncorrelated. With a graph representation the geometric property of 'empty' high dimensional space would be less of an issue.

In addition to the fact that with a graph you could analyze all sorts of weird data that doesn't have actual numerical coordinates! Like, protein structure, citation networks, gene ontology, etc.

j-bac commented 2 years ago

Thanks, those are all cool ideas. It is true using angular distance rather than euclidean is often better for scRNA. I close the issue but I'm happy to keep you up if I make progress !