GuangyuWangLab2021 / cellDancer

Predict RNA velocity through deep learning
https://guangyuwanglab2021.github.io/cellDancer_website/
BSD 3-Clause "New" or "Revised" License
60 stars 11 forks source link

Obtaining celldancer.pseudo_time() lineage assignment #18

Closed Jaimelan closed 7 months ago

Jaimelan commented 9 months ago

Hello,

I wanted to know where is the information of the cluster to which each cell belongs when running celldancer.pseudo_time().

When computing the pseudo time using cd.pseudo_time() I expect two differentiation paths which I give to the parameter n_paths and I get, indeed, two clusters of cells from which the pseudotime starts. I also get a graphical representation in which cells are colored by the lineage they belong, with some kind of thick line colored with a gradient that represents the overal pseudotime of each cluster.

I run the function assigning its results to the same dataframe with which I am working, i.e., celldancer_df (as per the tutorials).

The docs say that the resulting dataframe has two new columns [‘velocity1’, ‘velocity2’] but actually, that two columns where already in the dataframe and I have just one additional column called "pseudotime".

What I am lacking is the information with which the output graphic is represented, specially the assignment of cells to differentiation clusters. In a pipeline like slingshot, it would be something like "pseudotime1" and "pseudotime2".

Thanks in advance for your attention.

biopzhang commented 8 months ago

Hello Jaimelan,

First, good catch about the cell "clusters". We did not keep the "cluster" information in the output file. The rationale is as follows. The cells are assigned to "long trajectories" generated using the cell velocity, according to the cell's destination. Most of the time, there are overlaps in the "trajectories" and it is possible to deduce a unified pseudo-time. Those "long trajectories" are used for estimating a unified pseudo-time for the whole system, however, the "long trajectories" are not necessarily biologically relevant (unlike lineages).

Second, the two columns (‘velocity1’, ‘velocity2’) are the two-dimensional cell velocity (based on the embedding you use). The cell velocity is calculated at the beginning of the pseudo-time estimation. The one column is the unified pseudo-time, and there is no pseudotime for each lineage or trajectory.

I hope this explanation helps. Please let me know if you have further questions.

Pengzhi

Jaimelan commented 8 months ago

Thanks biopzhanf for your answer,

This question arised when I tried to establish some criteria to select genes that are affected along the pseudotime but that are being processed differently on similar pseudotimes but distant regions on the UMAP embedding, what might be considered different lineages on the pseudotime on a classic slingshot analysis, for instance. However, I understand the purpose of the function when finding those "long trajectories" and why they are not necessarily biologically relevant.

What I would like to get from the pipeline is a way to extract genes from the pseudotime without confounding distant groups of cells that are assigned similar pseudotimes but are far away on the transcriptomic space (and also annoted as very different cell types).

Thank you for you attention,

Jaime

biopzhang commented 7 months ago

I see. For your specific use case, you'll probably need to modify the source code to export the cluster information to a file. Please check line 1315 in pseudo_time.py. The variable cell_fate stores the fates (paths) of all the cells in order (a numpy array).

Jaimelan commented 7 months ago

Thanks for the tip, it's what I was looking for.

Best, Jaime