GregorySchwartz / too-many-cells

Cluster single cells and analyze cell clade relationships with colorful visualizations.
https://gregoryschwartz.github.io/too-many-cells/
GNU General Public License v3.0
104 stars 19 forks source link

Question: Does well separated color in the dendrogram mean that the items are distinct? #47

Open DongzeHE opened 2 years ago

DongzeHE commented 2 years ago

Hello,

Thanks for providing this interesting cell clustering method. I am using this method to analyze the similarity between the spliced and unspliced count matrix from the same single-cell RNA-seq dataset. So these two count matrices are actually two types of signals of the same sample. The result returned from TooManyCells is interesting, so I would hope that you could help me understand what the result tells us. Thank you in advance!

I gave TooManyCell the spliced and unspliced raw count matrices, and mark the two using "S" for spliced and "U" for unspliced. The result shows that the items in the two matrices are well separated and has no overlap.

image

If possible, could you please help me understand the result? I have the following questions:

  1. Do I need to normalize or scale the two matrices before running TooManyCells?
  2. If giving the raw count matrices is the correct thing to do, does this result (S and U items are separated) mean that the two types of signals from the same sample are totally different?
  3. Why is the unspliced side of the tree larger than the spliced side? Does this mean that unspliced can separate the data better?
  4. If the result shows that the two types of signals are totally different, how could I show that both of them are biologically meaningful? For example, do you think that finding rare cell types from the items in each matrix separately will show that the two matrices are different and both biologically meaningful? Are there any other things I can do to show that they are both biologically meaningful?

Thanks so much! I am looking forward to your reply!

Best, Dongze

GregorySchwartz commented 2 years ago

Sorry, this went under my radar.

  1. You can normalize in TooManyCells using a variety of different methods, see too-many-cells make-tree -h. By default, TF-IDF is used.
  2. Yes, in theory, but it depends on the normalization assumptions.
  3. What do you mean by larger? Size is indicated by node size, not the area of the tree, so they look pretty similar to me.
  4. You can try different normalizations if you expect there to be more mixing and TF-IDF (followed by cosine similarity) is insufficient.