farrellja / URD

URD - Reconstruction of Branching Developmental Trajectories
GNU General Public License v3.0
117 stars 41 forks source link

Using URD with small a number of cells #20

Closed beginner984 closed 5 years ago

beginner984 commented 5 years ago

Hi, I am really interested in working with URD. I have 9 time points and roughly 200 cells in each time point. However, k-NN distance identifies all of my cells as outliers; I tried to optimise my analysis with changing sigma to NULL, local, etc in diffusion map and another parameter setting to meet URD criterias but likely I am getting non sense results(I compared my clustering on final stage cells with seurat, that was pretty different and unexpected). Do you think with this small number of cells I still can use URD? What is your suggestion please

Many thanks

farrellja commented 5 years ago

Hi beginner,

It depends how many cell types you really think there are in your data -- 600 cells representing only a handful of states is probably enough, but if it is a more complex data set where you likely only have a couple of representatives of each cell state, URD may not work so well.

k-NN distance is dependent on your variable gene selection (NN distance is determined as Euclidean distance in variable gene expression space), so the k-NN parameters may need to be adjusted to fit your data. Additionally, the variable gene selection affects the computation of a diffusion map. So, it is also possible that both non-sensical results reflect that you need to try more restrictive or permissive variable gene selection, or that you have major competing signals that you want to remove from the list of variable genes. It might be worth testing a clustering of your data in URD to see how it compares to your clustering in Seurat (calcPCA and graphClustering). Since PCA & graphClustering are downstream of variable gene selection, if this fails, it could indeed point to such a problem.