aristoteleo / dynamo-release

Inclusive model of expression dynamics with conventional or metabolic labeling based scRNA-seq / multiomics, vector field reconstruction and differential geometry analyses
https://dynamo-release.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
413 stars 58 forks source link

Differences in vector field calculations between different functions #472

Closed Fra-Mante closed 10 months ago

Fra-Mante commented 1 year ago

Hello @Xiaojieqiu,

I am try to process my scSLAM-seq data with Dynamo. Ultimately I would like to generate a plot with vector filed arrows while the cells are distributed based on their cell cycle phase. To practice using dynamo, I used the rpe1_kinetics dataset provided in the scEU-seq cell cycle tutorial. I noticed that if I preprocess the dataset with pp.recipe_monocle or with tl.recipe_kin_data, the vector filed calculated is extremely different. Especially, when using pp.recipe_monocle, the velocy arrows dont seem following the cell-cyle order, but even going in the opposite direction.

Could you please help me understand why is so? I included below the script and the plots obtained.

You will find that by ordering the rpe1_kinetics.obsm["cell_cycle_scores"] using the 'cell_cycle_phase' and 'cell_cycle_progress' values, I generated a continuous range of values that should order the cells by cell cycle phase. This is because I do not work with Fucci, so I cannt use RFP_GFP to order them. I encounter this problem because when I use tl.recipe_kin_data to preprocess my data, Dynamo is not able to perform cell cycle staging automatically, while it can when I use pp.recipe_monocle.

Thank you for your help, Francesca.

dyn.tl.recipe_kin_data(adata=rpe1_kinetics,
                       keep_filtered_genes=True,
                       keep_raw_layers=True,
                       del_2nd_moments=False,
                       tkey='time',
                      )

rpe1_kinetics.obsm['X_RFP_GFP'] = rpe1_kinetics.obs.loc[:, ['RFP_log10_corrected', 'GFP_log10_corrected']].values.astype('float')
dyn.tl.reduceDimension(rpe1_kinetics, reduction_method='umap')
dyn.tl.cell_velocities(rpe1_kinetics, enforce=True, vkey='velocity_T', ekey='M_t', basis='RFP_GFP')
dyn.pl.streamline_plot(rpe1_kinetics, color=['Cell_cycle_possition', 'Cell_cycle_relativePos'], basis='RFP_GFP')
dyn.pl.streamline_plot(rpe1_kinetics, color=['Cell_cycle_possition', 'Cell_cycle_relativePos'], basis='umap')

rpe1_kinetics.obsm["cell_cycle_scores"]["order"] = range(len(rpe1_kinetics.obsm["cell_cycle_scores"]))
#create a column with 0to1 values to order the cells by a timeline
rpe1_kinetics.obsm["cell_cycle_scores"].sort_values(['cell_cycle_phase', 'cell_cycle_progress'], ascending = [True, False], inplace=True)
rpe1_kinetics.obsm["cell_cycle_scores"]["time_line"] = range(len(rpe1_kinetics.obsm["cell_cycle_scores"]))
rpe1_kinetics.obsm['cell_cycle_scores']['time_line_norm'] = rpe1_kinetics.obsm['cell_cycle_scores']['time_line']/rpe1_kinetics.obsm['cell_cycle_scores']['time_line'].max()
#use the 'order' column generted earleir to restore the same cell order as initally
rpe1_kinetics.obsm["cell_cycle_scores"].sort_values(['order'], inplace=True)

#Make time vs size factor
rpe1_kinetics.obs['time_line_norm'] = rpe1_kinetics.obsm['cell_cycle_scores']['time_line_norm']
rpe1_kinetics.obsm['cell_cycle_scores']['Size_Factor'] = rpe1_kinetics.obs.Size_Factor
rpe1_kinetics.obsm['X_timeVSsizeFactor'] = rpe1_kinetics.obsm["cell_cycle_scores"].loc[:, ['time_line_norm', 'Size_Factor']].values.astype('float')

#plots
dyn.pl.streamline_plot(rpe1_kinetics, color=['Size_Factor', 'cell_cycle_phase'], basis='X_timeVSsizeFactor', show_arrowed_spines = True)
dyn.pl.streamline_plot(rpe1_kinetics, color=['cell_cycle_phase','Cell_cycle_possition'], basis='umap')
B2 B3

#Dynamo preprocess
dyn.pp.recipe_monocle(rpe1_kinetics)

#Prep dataframe
rpe1_kinetics.obsm['X_RFP_GFP'] = rpe1_kinetics.obs.loc[:, ['RFP_log10_corrected', 'GFP_log10_corrected']].values.astype('float')
rpe1_kinetics.obsm["cell_cycle_scores"]["order"] = range(len(rpe1_kinetics.obsm["cell_cycle_scores"]))
#create a column with 0to1 values to order the cells by a timeline
rpe1_kinetics.obsm["cell_cycle_scores"].sort_values(['cell_cycle_phase', 'cell_cycle_progress'], ascending = [True, False], inplace=True)
rpe1_kinetics.obsm["cell_cycle_scores"]["time_line"] = range(len(rpe1_kinetics.obsm["cell_cycle_scores"]))
rpe1_kinetics.obsm['cell_cycle_scores']['time_line_norm'] = rpe1_kinetics.obsm['cell_cycle_scores']['time_line']/rpe1_kinetics.obsm['cell_cycle_scores']['time_line'].max()
#use the 'order' column generted earleir to restore the same cell order as initally
rpe1_kinetics.obsm["cell_cycle_scores"].sort_values(['order'], inplace=True)
#Make time vs size factor
rpe1_kinetics.obs['time_line_norm'] = rpe1_kinetics.obsm['cell_cycle_scores']['time_line_norm']
rpe1_kinetics.obsm['cell_cycle_scores']['Size_Factor'] = rpe1_kinetics.obs.Size_Factor
rpe1_kinetics.obsm['X_timeVSsizeFactor'] = rpe1_kinetics.obsm["cell_cycle_scores"].loc[:, ['time_line_norm', 'Size_Factor']].values.astype('float')

#Dynamo process
dyn.tl.dynamics(rpe1_kinetics)
dyn.tl.reduceDimension(rpe1_kinetics)
dyn.tl.cell_velocities(rpe1_kinetics, calc_rnd_vel=True)

#plots
dyn.pl.streamline_plot(rpe1_kinetics, color=['Cell_cycle_possition', 'Cell_cycle_relativePos'], basis='umap')
dyn.pl.streamline_plot(rpe1_kinetics, color=['Size_Factor', 'cell_cycle_phase'], basis='X_timeVSsizeFactor', show_arrowed_spines = True)
dyn.pl.streamline_plot(rpe1_kinetics, color=['cell_cycle_phase','Cell_cycle_possition'], basis='umap')
A1 A2
Xiaojieqiu commented 1 year ago

by default, the following code will treat the data as one-shot which is not correct and recipe_kin_data will properly run the analyses using the two-step fitting procedure to model the pulse labeling data. So that is the difference. You can read about the source code of recipe_kin_data to understand more. Happy to help you further

dyn.pp.recipe_monocle(rpe1_kinetics)
dyn.tl.dynamics(rpe1_kinetics)
Fra-Mante commented 1 year ago

Hello @Xiaojieqiu,

thank you for your reply. I understand now that is very important to not mix one-shot with kinetics functions.

Going back to my main concern then, I would like to process with Dynamo my scSLAM-seq data, labelled all for 2h. I believe I should process it as one-shot e.g the scNT-seq neuronal activity tutorial (pp.recipe_monocle() and dyn.tl.dynamics()). I did that and I then generated a "time_line" range of values using the 'cell_cycle_phase' and 'cell_cycle_progress' values (as shown in my script above). However, the velocity arrows are then shown going backwards in relation to the cell cycle development. See plot below.

Screenshot 2023-04-03 at 17 36 54

Could you please provide me a suggestion on how to process my scSLAM-seq (if what I am already doing is not correct) and how to calculate the continuous values that would help me plot the cells ordered by their cell cycle?

Thank you, Francesca

Xiaojieqiu commented 1 year ago

can you please provide the code you generate the above figure please? I can look a close look at it

Fra-Mante commented 1 year ago

Thank you for getting back to me. Here is the code used:

import warnings
warnings.filterwarnings('ignore')

import dynamo as dyn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

dyn.get_all_dependencies_version()
import hdf5plugin

#Import scSLAM-seq data
adata = dyn.read('mypath/myfile.h5ad')
adata.obs.time = adata.obs.time/60 # convert minutes to hours

dyn.pp.convert2symbol(adata)
adata = dyn.pp.cell_cycle_scores(adata, copy= True)
print(adata)

#Dynamo processing for one-shot
dyn.pp.recipe_monocle(
    adata,
    tkey="time",
    experiment_type="one-shot",
)
dyn.tl.dynamics(adata)
dyn.tl.reduceDimension(adata)
dyn.tl.cell_velocities(adata, calc_rnd_vel=True)
dyn.vf.VectorField(adata, basis='umap')

#Make time vs factor_y
#by using 'cell_cycle_phase' and 'cell_cycle_progress' to order cells by time line
#then generate 'time_line_norm' as values from 0 to 1 to order cells by time line
#then make obsm['time_xVSfactor_y'] for plotting with time line on the x axis
adata.obsm["cell_cycle_scores"]["order"] = range(len(adata.obsm["cell_cycle_scores"]))
adata.obsm["cell_cycle_scores"].sort_values(['cell_cycle_phase', 'cell_cycle_progress'], ascending = [True, False], inplace=True)
adata.obsm["cell_cycle_scores"]["time_line"] = range(len(adata.obsm["cell_cycle_scores"]))
adata.obsm['cell_cycle_scores']['time_line_norm'] = adata.obsm['cell_cycle_scores']['time_line']/adata.obsm['cell_cycle_scores']['time_line'].max()
adata.obsm["cell_cycle_scores"].sort_values(['order'], inplace=True)
adata.obsm['cell_cycle_scores']['factor_y'] = adata.obs.factor_y
adata.obs['time_line_norm'] = adata.obsm['cell_cycle_scores']['time_line_norm']
adata.obsm['X_timexVSfactory'] = adata.obsm["cell_cycle_scores"].loc[:, ['time_line_norm', 'factor_y']].values.astype('float')

#plotting
dyn.pl.streamline_plot(adata, 
                        color=['cell_cycle_phase', 'time_line_norm'], 
                        basis='X_timexVSfactory', 
                        show_arrowed_spines = True, 
                        color_key_cmap = 'viridis')

and here is dynamo output


|-----> setting visualization default mode in dynamo. Your customized matplotlib settings might be overritten.
package pre-commit colorcet dynamo-release get-version loompy matplotlib  \
version     2.21.0    3.0.1          1.2.0       3.5.4  3.0.7      3.6.3   

package networkx   numba numdifftools   numpy openpyxl pandas pynndescent  \
version      3.0  0.56.4       0.9.41  1.23.5   3.0.10  1.5.2       0.5.8   

package python-igraph scikit-learn   scipy seaborn setuptools statsmodels  \
version        0.10.3        1.2.0  1.10.0  0.12.2     56.0.0      0.13.5   

package    tqdm typing-extensions umap-learn  
version  4.64.1             4.4.0      0.5.3  
|-----> convert ensemble name to official gene name
|-----? Your adata object uses non-official gene names as gene index. 
Dynamo is converting those names to official gene names.
|-----> Storing myGene name info into local cache db: mygene_cache.sqlite.
Error: The requests_cache python module is required to use request caching.
See - https://requests-cache.readthedocs.io/en/latest/user_guide.html#installation
querying 1-1000...done.
querying 1001-2000...done.
querying 2001-3000...done.
querying 3001-4000...done.
querying 4001-5000...done.
querying 5001-6000...done.
querying 6001-7000...done.
querying 7001-8000...done.
querying 8001-9000...done.
querying 9001-10000...done.
querying 10001-10391...done.
Finished.
39 input query terms found no hit:
        ['ENSG00000112096', 'ENSG00000116957', 'ENSG00000130489', 'ENSG00000130723', 'ENSG00000163009', 'ENS
Pass "returnall=True" to return complete lists of duplicate or missing query terms.
|-----> Subsetting adata object and removing Nan columns from adata when converting gene names.
|-----> Deep copying AnnData object and working on the new copy. Original AnnData object will not be modified.
|-----> computing cell phase...
|-----> [cell phase estimation] in progress: 100.0000%
|-----> [cell phase estimation] finished [42.4209s]
|-----> <insert> cell_cycle_phase to obs in AnnData Object.
|-----> <insert> cell_cycle_scores to obsm in AnnData Object.
|-----> [Cell Cycle Scores Estimation] in progress: 100.0000%
|-----> [Cell Cycle Scores Estimation] finished [0.2817s]
AnnData object with n_obs × n_vars = 1395 × 10105
    obs: 'BC_ID', 'sample_id', 'replicate', 'batch', 'code_id', 'factor_y', 'time', 'cell_cycle_phase'
    var: 'Gene_Id', 'query', 'scopes', '_id', '_score', 'symbol', 'notfound'
    uns: 'cell_phase_genes'
    obsm: 'cell_cycle_scores'
    layers: 'new', 'total'
|-----> recipe_monocle_keep_filtered_cells_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_filtered_cells_key=True
|-----> recipe_monocle_keep_filtered_genes_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_filtered_genes_key=True
|-----> recipe_monocle_keep_raw_layers_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_raw_layers_key=True
|-----> apply Monocole recipe to adata...
|-----> <insert> pp to uns in AnnData Object.
|-----------> <insert> has_splicing to uns['pp'] in AnnData Object.
|-----------> <insert> has_labling to uns['pp'] in AnnData Object.
|-----------> <insert> splicing_labeling to uns['pp'] in AnnData Object.
|-----------> <insert> has_protein to uns['pp'] in AnnData Object.
|-----> ensure all cell and variable names unique.
|-----> ensure all data in different layers in csr sparse matrix format.
|-----> ensure all labeling data properly collapased
|-----> detected experiment type: one-shot
|-----------> <insert> tkey to uns['pp'] in AnnData Object.
|-----------> <insert> experiment_type to uns['pp'] in AnnData Object.
|-----> filtering cells...
|-----> <insert> pass_basic_filter to obs in AnnData Object.
|-----> 1395 cells passed basic filters.
|-----> filtering gene...
|-----> <insert> pass_basic_filter to var in AnnData Object.
|-----> 519 genes passed basic filters.
|-----> calculating size factor...
|-----? only 519 genes passed basic filtering, but you requested 2000 genes for feature selection. Try lowering the gene selection stringency: {'min_expr_cells': 0, 'min_expr_avg': 0, 'max_expr_avg': inf, 'svr_gamma': None, 'winsorize': False, 'winsor_perc': (1, 99.5), 'sort_inverse': False}
|-----> selecting genes in layer: X, sort method: SVR...
|-----> <insert> frac to var in AnnData Object.
|-----> size factor normalizing the data, followed by log1p transformation.
|-----> Set <adata.X> to normalized data
|-----> applying PCA ...
|-----> <insert> pca_fit to uns in AnnData Object.
|-----> <insert> ntr to obs in AnnData Object.
|-----> <insert> ntr to var in AnnData Object.
|-----> cell cycle scoring...
|-----> computing cell phase...
|-----> [cell phase estimation] in progress: 100.0000%
|-----> [cell phase estimation] finished [0.7410s]
|-----> <insert> cell_cycle_phase to obs in AnnData Object.
|-----> <insert> cell_cycle_scores to obsm in AnnData Object.
|-----> [Cell Cycle Scores Estimation] in progress: 100.0000%
|-----> [Cell Cycle Scores Estimation] finished [0.2088s]
|-----> [recipe_monocle preprocess] in progress: 100.0000%
|-----> [recipe_monocle preprocess] finished [0.7401s]
|-----> dynamics_del_2nd_moments_key is None. Using default value from DynamoAdataConfig: dynamics_del_2nd_moments_key=False
|-----------> removing existing M layers:[]...
|-----------> making adata smooth...
|-----> calculating first/second moments...
OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
|-----> [moments calculation] in progress: 100.0000%
|-----> [moments calculation] finished [18.1832s]
|-----? Your adata only has labeling data, but `NTR_vel` is set to be `False`. Dynamo will reset it to `True` to enable this analysis.
estimating gamma: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 519/519 [00:04<00:00, 105.24it/s]
|-----> retrive data for non-linear dimension reduction...
|-----> perform umap...
|-----> [dimension_reduction projection] in progress: 100.0000%
|-----> [dimension_reduction projection] finished [18.6491s]
|-----> incomplete neighbor graph info detected: connectivities and distances do not exist in adata.obsp, indices not in adata.uns.neighbors.
|-----> Neighbor graph is broken, recomputing....
|-----> Start computing neighbor graph...
|-----------> X_data is None, fetching or recomputing...
|-----> fetching X data from layer:None, basis:pca
|-----> method arg is None, choosing methods automatically...
|-----------> method ball_tree selected
|-----> <insert> connectivities to obsp in AnnData Object.
|-----> <insert> distances to obsp in AnnData Object.
|-----> <insert> neighbors to uns in AnnData Object.
|-----> <insert> neighbors.indices to uns in AnnData Object.
|-----> <insert> neighbors.params to uns in AnnData Object.
|-----> 0 genes are removed because of nan velocity values.
|-----> [calculating transition matrix via pearson kernel with sqrt transform.] in progress: 100.0000%
|-----> [calculating transition matrix via pearson kernel with sqrt transform.] finished [3.1576s]
|-----> [projecting velocity vector to low dimensional embedding] in progress: 100.0000%
|-----> [projecting velocity vector to low dimensional embedding] finished [0.7312s]
|-----> VectorField reconstruction begins...
|-----> Retrieve X and V based on basis: UMAP. 
        Vector field will be learned in the UMAP space.
|-----> Generating high dimensional grids and convert into a row matrix.
|-----> Learning vector field with method: sparsevfc.
|-----> [SparseVFC] begins...
|-----> Sampling control points based on data velocity magnitude...
|-----> [SparseVFC] in progress: 100.0000%
|-----> [SparseVFC] finished [0.0709s]
|-----------> current cosine correlation between input velocities and learned velocities is less than 0.6. Make a 1-th vector field reconstruction trial.
|-----> [SparseVFC] begins...
|-----> Sampling control points based on data velocity magnitude...
|-----> [SparseVFC] in progress: 100.0000%
|-----> [SparseVFC] finished [0.0869s]
|-----------> current cosine correlation between input velocities and learned velocities is less than 0.6. Make a 2-th vector field reconstruction trial.
|-----> [SparseVFC] begins...
|-----> Sampling control points based on data velocity magnitude...
|-----> [SparseVFC] in progress: 100.0000%
|-----> [SparseVFC] finished [0.0828s]
|-----------> current cosine correlation between input velocities and learned velocities is less than 0.6. Make a 3-th vector field reconstruction trial.
|-----> [SparseVFC] begins...
|-----> Sampling control points based on data velocity magnitude...
|-----> [SparseVFC] in progress: 100.0000%
|-----> [SparseVFC] finished [0.0845s]
|-----------> current cosine correlation between input velocities and learned velocities is less than 0.6. Make a 4-th vector field reconstruction trial.
|-----> [SparseVFC] begins...
|-----> Sampling control points based on data velocity magnitude...
|-----> [SparseVFC] in progress: 100.0000%
|-----> [SparseVFC] finished [0.0842s]
|-----------> current cosine correlation between input velocities and learned velocities is less than 0.6. Make a 5-th vector field reconstruction trial.
|-----? Cosine correlation between input velocities and learned velocities is less than 0.6 after 5 trials of vector field reconstruction.
|-----> <insert> velocity_umap_SparseVFC to obsm in AnnData Object.
|-----> <insert> X_umap_SparseVFC to obsm in AnnData Object.
|-----> <insert> VecFld_umap to uns in AnnData Object.
|-----> <insert> control_point_umap to obs in AnnData Object.
|-----> <insert> inlier_prob_umap to obs in AnnData Object.
|-----> <insert> obs_vf_angle_umap to obs in AnnData Object.
|-----> [VectorField] in progress: 100.0000%
|-----> [VectorField] finished [0.6016s]
|-----> retrive data for non-linear dimension reduction...
|-----? adata already have basis timexVSfactory. dimension reduction timexVSfactory will be skipped! 
set enforce=True to re-performing dimension reduction.
|-----> Start computing neighbor graph...
|-----------> X_data is None, fetching or recomputing...
|-----> fetching X data from layer:None, basis:pca
|-----> method arg is None, choosing methods automatically...
|-----------> method ball_tree selected
|-----> <insert> connectivities to obsp in AnnData Object.
|-----> <insert> distances to obsp in AnnData Object.
|-----> <insert> neighbors to uns in AnnData Object.
|-----> <insert> neighbors.indices to uns in AnnData Object.
|-----> <insert> neighbors.params to uns in AnnData Object.
|-----> [dimension_reduction projection] in progress: 100.0000%
|-----> [dimension_reduction projection] finished [0.0741s]
|-----> 0 genes are removed because of nan velocity values.
Using existing pearson_transition_matrix found in .obsp.
|-----> [projecting velocity vector to low dimensional embedding] in progress: 100.0000%
|-----> [projecting velocity vector to low dimensional embedding] finished [0.7237s]
|-----------> plotting with basis key=X_timexVSfactory
|-----------> skip filtering cell_cycle_phase by stack threshold when stacking color because it is not a numeric type

and here again the plot obtained

Screenshot 2023-04-05 at 11 24 58
Xiaojieqiu commented 1 year ago

first of all, can you try dyn.tl.recipe_one_shot_data(adata, tkey='time')?

Looking at your code, I found a few things: a. cell-cycle staging is better to be done after size factor normalization and log1p transformation. for now you can use receipe_monocle to get the staging correctly. Btw, we are overhauling our preprocessing module and will have a functionable class to perform flexible preprocessing, including staging cell cycle. etc. b. what is the adata.obs.factor_y in your data? c. try also first project to the timexVSfactory basis before plotting the streamline. dyn.tl.cell_velocities(adata, basis='timexVSfactory')

Let me know how it goes

Fra-Mante commented 1 year ago

a. I already tried to perform the cell_cycle staging with recipe_monocle, but it really doesnt seem to have an effect on the velocity arrows. b. I prefer not to disclose this information at the moment. However, the backwards arrows are present even if I use a different value for the y axis. for example I tried with my time_line_norm vs the ntr from dynamo.

Screenshot 2023-04-06 at 07 47 31

c. Thanks, I tried it. See below the code with a. and c. modification. Unfortunately the plotting does not seem changing. See code and plot below.

Do you have any feedbacks regarding the dynamo output when running the code? perhaps the fact that only a small fraction of the genes passes the quality test, or the warnings about the cosine correlation? Does it make sense to you how I calculate the time_line_norm?

few more infor about my dataset: My umi counts have already been corrected using spike-in. the total layer has the same values as adata.X, while new layer has been obtain by multiplying the umi counts with the "new-to-total ratio" calculated with a different software.


#dyn.pp.convert2symbol(adata)
#adata = dyn.pp.cell_cycle_scores(adata, copy= True)
print(adata)

#Dynamo processing for one-shot
dyn.pp.recipe_monocle(
    adata,
    tkey="time",
    experiment_type="one-shot",
)
dyn.tl.dynamics(adata)
dyn.tl.reduceDimension(adata)
dyn.tl.cell_velocities(adata)
dyn.vf.VectorField(adata, basis='umap')

#Make time vs factor_y
#by using 'cell_cycle_phase' and 'cell_cycle_progress' to order cells by time line
#then generate 'time_line_norm' as values from 0 to 1 to order cells by time line
#then make obsm['time_xVSfactor_y'] for plotting
adata.obsm["cell_cycle_scores"]["order"] = range(len(adata.obsm["cell_cycle_scores"]))
adata.obsm["cell_cycle_scores"].sort_values(['cell_cycle_phase', 'cell_cycle_progress'], ascending = [True, False], inplace=True)
adata.obsm["cell_cycle_scores"]["time_line"] = range(len(adata.obsm["cell_cycle_scores"]))
adata.obsm['cell_cycle_scores']['time_line_norm'] = adata.obsm['cell_cycle_scores']['time_line']/adata.obsm['cell_cycle_scores']['time_line'].max()
adata.obsm["cell_cycle_scores"].sort_values(['order'], inplace=True)
adata.obsm['cell_cycle_scores']['factor_y'] = adata.obs.factor_y
adata.obs['time_line_norm'] = adata.obsm['cell_cycle_scores']['time_line_norm']
adata.obsm['X_timexVSfactory'] = adata.obsm["cell_cycle_scores"].loc[:, ['time_line_norm', 'factor_y']].values.astype('float')

#projecting
dyn.tl.cell_velocities(adata, basis='timexVSfactory')

#plotting
dyn.pl.streamline_plot(adata, 
                        color=['cell_cycle_phase', 'time_line_norm'], 
                        basis='X_timexVSfactory', 
                        show_arrowed_spines = True, 
                        color_key_cmap = 'viridis')
Screenshot 2023-04-06 at 07 40 42

Thanks for following up my query.

Xiaojieqiu commented 1 year ago

did you see my first suggestion of using dyn.tl.recipe_one_shot_data(adata, tkey='time')? In addition to this, you should also look at the results without using your special coordinates and plot the default umap while coloring your cells with the cell cycle stage.

Second, it is worth noting that if you want to just order by cell cycle, you'd better to only use cell cycle related genes for this analyses. This is because a cell can involve many different dynamical processes, many may be not relevant to cell cycle at all. Thus the unsupervised approach won't necessarily give you the expected result you want to. Try only taking out the cell cycle genes as the feature genes (pass the cell cycle gene list to genes_to_list in recipe_monocle) and then rerun the analyses. I can provide further suggestions if simply taking known cell cycle genes don't work well

Lastly, it is important to know that the metabolic labeling data requires sophisticated correction. Because if you just naively take the reads with abundant t->c mutations, your new RNA is not corrected and is a underestimate. Therefore the transcription rates will be estimated to be lower than expected and thus the RNA velocity will be opposite to what you expected. Did you use our dynast package for the new RNA estimation and correction?

Fra-Mante commented 1 year ago

Thank you for your feedbacks. I tried dyn.tl.recipe_one_shot_data(adata, tkey='time') and it has drastically changed the layout of the arrows. Overall they are now more directioned with the cell cycle, but they still have a very confusing pattern. Do you mean genes_to_use? This option is present only with dyn.pp.recipe_monocle and not dyn.tl.recipe_one_shot_data, so how can I use it if I want to process the data with the latter function? Also, what does this list affect? Just the cell cycle staging or also follow up steps? Also, what is the correct way to make this list if I cannot specify which genes is associated to which phase?

I also tried to plot a umap. It looked very different when using dyn.pp.recipe_monocle or dyn.tl.recipe_one_shot_data, but they both have in common that the cells are grouped by batch (my cells had to be sequenced in multiple batches). does dynamo or dynast correct for it?

Thanks

Fra-Mante commented 1 year ago

Hello, I am trying to use dynast for generating the input file for dynamo. In the meanwhile, could you please let me know if dynast or dynamo offer a way to correct for batch effect? My cells were sequencing in multiple batches and I can see a strong effect of this when plotting the umap. Thanks

Sichao25 commented 1 year ago

Hey. Apologies for the delayed response. Next time if no one gets back to you within a few days, feel free to tag me or xiaojie in your message. Now, regarding your question, we have some batch correction methods available, such as pp.integrate and pp.harmony_debatch. Additionally, you can explore other third-party tools that work with sparse matrix/array to accomplish the same task.

Going back to your previous concern, yes, currently gene_to_use is only in recipe_monocle and it will affect all downstream analysis. But you can manually remove those genes from the filter or adata object if you want to use recipe_one_shot_data. Also, maybe you have already known this, but I want to clarify again that dyn.pp.recipe_monocle and dyn.tl.recipe_one_shot_data are two distinct recipes with different purposes. The recipe_monocle is only for preprocessing, whereas recipe_one_shot_data contains preprocessing (recipe monocle), velocity calculation, dimension reduction, and velocity projection. When utilizing recipe_monocle, remember to execute any other necessary steps that may be required to achieve your desired outcome. Hope this will help you solve the problem.

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days