Closed Tom-900 closed 5 months ago
Thank you for your question. Below I copied the response from Xiuyuan, the co-first author of the GeneTrajectory paper.
I think the question is about the rationale of using "spectral norm" ||S(x_i)|| to select the endpoint of the trajectory. The theoretical interpretation of data samples with large "spectral norm" can be found in the following papers:
[1] X. Cheng and G. Mishne. "Spectral embedding norm: looking deep into the spectrum of the graph Laplacian". SIAM Journal on Imaging Sciences (2020). [Abstract] [arXiv: 1810.10695] [Code]
[2] X. Cheng, G. Mishne, and S. Steinerberger. "The geometry of nodal sets and outlier detection". Journal of Number Theory (2017). [Abstract] [arXiv:1706.01362]
The basic idea is that the sample points with the largest spectral norm are those that are most "representative" within a cluster, in the sense of most uniquely belonging to a cluster and dissimilar with data points outside that cluster. This is reflected in the analysis in the above two papers under the setting of outlier points/clusters, and the experimental examples go beyond clustering - see the manifold+outlier case in [1].
In the case of gene trajectory, we are in the situation of not exactly clustering, but when several trajectories stem out from a "middle cohort". In this case, it still holds that the endpoint of a trajectory has the property that it is "connected/similar to" to points along that trajectory but very dissimilar to genes in any other trajectories or middle cohort. Thus it can be interpreted as a "representative point" or "outlier," and we expect that large spectral norm can work to find them.
The procedure was also used in an earlier paper by Gal and Raphy [3] https://www.biorxiv.org/content/10.1101/313981v1.abstract which is used to extract interesting clusters.
Thank you so much for your kind reply! This is a very clear and thorough explanation which helps me a lot. Thanks again for your help.
Hi!
Thank you very much for developing this method. When reading the paper, I was quite confused about the following paragraph from Methods part, step 3 (Construct gene trajectories):
I wonder why we can make such an assumption? Is there any further explanation or references for this point? Thnk you very much and hope to hear from you.
Best, Tom