LiQian-XC / sctour

A deep learning architecture for robust inference and accurate prediction of cellular dynamics
https://sctour.readthedocs.io
MIT License
51 stars 4 forks source link

More demonstrations? #6

Open poseidonchan opened 1 year ago

poseidonchan commented 1 year ago

Hi,

Thanks for developing scTour which provides another possible solution to RNA velocity problem. I want to ask that, what would happen if the stem cells are on the middle of UMAP, like surrounded by different terminal cell types. Can scTour predict this scenario successfully?

LiQian-XC commented 1 year ago

Hi, thanks for your question. scTour can indeed handle such processes with multiple terminal cell types originated from the same stem cells. Please find attached an example where epiblast gave rise to different cell types of the three germ layers during gastrulation, which was correctly inferred by scTour. This is also shown in supplementary figure 3 in the manuscript posted on bioRxiv. Please let me know if you have any other questions. FigS4

poseidonchan commented 1 year ago

Hi Dr. Qian ,

Thanks for your quick reply. It seems alright from the sample you show but I may find a counter example on a RNA-seq dataset. This is a bone marrow dataset from Seurat data (CITE-seq dataset, bmcite), here I only use the RNA modality to try your method and find the predicted embedding and velocity are not very satisfying. The cell embeddings are not very meaningful, e.g. cell types are mixed together, and the velocity doesn't follow the expectation. So one possible reason is that it might not be a very suitable dataset because it contains two many terminal cell types and the cells in a common embedding space is not continuous and therefore it is hard for the analysis. Do you have any interpretation on this example? When applying your method to some datasets, how do I know whether the prediction is reasonable or not?

743feb35-bf26-4a6e-9648-d9bcbe9fb999

LiQian-XC commented 1 year ago

Thanks for raising this question. Since this dataset is dominated by terminal cell types and there are fewer progenitors and lack of intermediate cells, it's quite hard to faithfully reconstruct the trajectory. Under such cases, you can give higher weight to the latent space derived from encoder and lower weight to that from neural ODE during training and inferring the embedding (e.g., tnode = sct.train.Trainer(adata, alpha_recon_lode=0.1, alpha_recon_lec=0.9); tnode.train(); mix_zs, zs, pred_zs = tnode.get_latentsp(alpha_z=0.9, alpha_predz=0.1)). By this, the embedding from scTour is similar with PCA-based UMAP although the T cell subtypes are still not clearly separated (please see attached the results from scTour with the UMAP in the first row from PCA and the second row from scTour's embedding). Due to the reason mentioned above, it's very hard to obtain a statisfying vector field although there are some expected flows for example from GMP to monocytes.

In scTour, there is no metric like confidence score to tell how confident the reconstructed trajectory is. You may need to judge based on some prior knowledge you may have for the biological processes you are studying.

Hope this is clear to you. Please let me know if you have any other questions. test76

poseidonchan commented 1 year ago

Hi Dr. Qian:

Thank you very much. I admit this scenario may be quite hard for a pseudo-time or RNA velocity method to correctly estimate differential trajectory because of too many terminal cells. Probably other tools can not deal with this situation. And almost all the tools can not have a confidence before incorporating biological knowledge. Whatever, scTour has advanced a lot (generalized to one RNA modality) and I think it will be a very influential tool.

Good Luck!

Yanshuo

poseidonchan commented 1 year ago

Hi, Dr. Qian:

I think I have found another example whereas scTour fails. The dataset is the Dentate Gyrus neurogenesis from scVelo scvelo.datasets.dentategyrus_lamanno(), the predicted pseudo time is totally opposite the expected one. download

I think it is highly related to the CytoTRACE method, because: download-1

Regards, Yanshuo

LiQian-XC commented 1 year ago

Hi Yanshuo,

The reversed pseudotime and vector field are due to the two possible integration directions (forward or backward) when solving an ODE. So the inferred pseudotime can be in the correct ordering (ascending), or the reverse (descending). Although scTour takes into account the gene counts which are shown to be correlated with developmental potential (CytoTRACE), this rule does not apply to all data. To resolve this, scTour provides a post-inference function to reverse the pseudotime and vector field. Please refer to the tutorial "scTour inference – Post-inference adjustment" in the readthedocs here. Briefly, you can use this function adata.obs['ptime'] = sct.train.reverse_time(adata.obs['ptime'].values) to adjust the pseudotime, and set the parameter reverse to be True when visualizing the vector field (sct.vf.plot_vector_field(adata, reverse=True, ...)).

For this dataset, after you do the adjustment, you will find one problem that the root state for this process was not unambiguously defined, with the immature astrocytes (ImmAstro) showing slightly lower pseudotime than the expected root of radial glia, probably due to the shared glia-like traits of radial glia and immature astrocytes that blur their transcriptomic distinctions and thus pseudotime ordering (please see attached).

I hope this is clear to you.

FigS2

poseidonchan commented 1 year ago

Hi, Dr. Qian:

Yeah, I also found that CytoTRACE's prediction is interesting. Because though I found pseudo time of several datasets is wrong, the reversed prediction would be very reasonable. So it seems the gene count is still a good measure for the differentiation potential but the direction is not always right. Thanks for your discussion, it is really helpful!

Regards, Yanshuo