Closed wbrett87 closed 5 months ago
Hi @wbrett87, thank you for this interesting question.
Since the diffusion map is a non-linear mapping, the dot-product approach to infer loadings does not work here. Instead of computing eigenvectors of the covariance matrix, like in PCA, we compute eigenvectors of a diffusion based transition matrix. While this transition matrix is informed by the cell-to-cell similarity, which is in turn determined by the gene-expression levels, there is no other direct involvement of gene expression and hence no direct link between genes and diffusion map components. The advantage here is that phenotypic manifold can have any weird shape, e.g., form a spiral, and will be straitened out in diffusion map space. It is hard to call one gene more or less responsible for this diffusion map component beyond their variability in the dataset:
Thanks for explaining that, the plot is very helpful! So what is the best way to derive biological insight from a component that is strong in a cell type but has low values everywhere else. Is there any way to determine which genes "significantly" change as you travel from low component values to high component values (i.e. some sort of differential expression analysis)? Sorry if these questions are naive, I'm a biologist (with an admittedly deficiency in math) struggling self teach myself this stuff.
No worries at all! You might find the Gene-expression-trends section of the Palantir tutorial interesting. It showcases how you could investigate how genes are changing as cells change, e.g., along a differentiation trajectory. While such a trajectory could be described by a single diffusion map component, it can also be a combination of multiple components.
You might also like the gene-change-anaysis tutorial for Mellon. The score ranks each gene by the amount it changes when cells transition down a trajectory, inversely weighting by a cell-state density. The weighting ensures that the more rapid changes, that are inherently underrepresented in the dataset, are properly taken into account. Mellon is a dependency of Palantir, so if Palantir is running you should also be able to load Mellon. Refer to the basic tutorial for the required preprocessing.
Please let me know if you run into any issues! It always helps to see what people might get stuck with, so we can improve the software.
Hello!
I was wondering if there is any way to extract loadings for Palantir diffusion components to determine which genes contribute most to a specific component. Essentially like scanpy and seurat allow you to plot which genes contribute to specific principle components. ChatGPT is suggesting a workflow where I do a numpy dot product between the transposed adata.X and the diffusion components. No clue if that even makes any sense, but I tried and it crashes my computer (64gb RAM).
Thanks!!