GuangyuWangLab2021 / cellDancer

Predict RNA velocity through deep learning
https://guangyuwanglab2021.github.io/cellDancer_website/
BSD 3-Clause "New" or "Revised" License
60 stars 11 forks source link

How to export results to scvelo #12

Closed sup27606 closed 11 months ago

sup27606 commented 1 year ago

Dear developer, I have calculated kinetic rates and pseudo times using cellDancer. Now I would like to do further analysis, i.e. find the top developmental genes and plot them in a heat map as function of pseudo time. I didn't see any function in cellDancer to do this (is it possible?).

The other alternative is to feed the kinetic parameters into scvelo annData, which can estimate likelihoods on a gene basis to identify the top developmental genes. I was wondering, how to export the output from cellDancer into scvelo annData object. I already have a scvelo annData of the same system and can replace the relevant data slots. I just need to know which slots to update with cellDancer values. Thank you so much.

Abclisy commented 1 year ago

Hello, thank you for reaching out to us. For the 'top developmental genes,' I guess what you meant is to find genes of interest. We could use the loss to identify well-trained genes. Probably you can plot a density plot of loss scores for all genes and extract the genes with the lowest training loss scores as shown in Fig. 3C. The code below is an example of displaying the density of loss. You can also filter genes by loss in the second function (filter_gene_by_loss) below.

def plt_loss_density(df,hist=False):
    loss_summary=df[['gene_name','loss']].drop_duplicates()

    import seaborn as sns
    sns.distplot(loss_summary['loss'], 
                 hist=hist, 
                 kde=True, 
                 bins=50,
                 color = 'black', 
                 kde_kws={'linewidth': 4}
                )
    plt.xlim(0,None)
    plt.show()
    return(loss_summary)

def filter_gene_by_loss(df,
                        loss_larger_than=0,
                        loss_smaller_than=None):

    loss_summary=df[['gene_name','loss']].drop_duplicates()
    loss_summary=loss_summary[loss_summary.loss>loss_larger_than]
    if loss_smaller_than is not None:
        loss_summary=loss_summary[loss_summary.loss<loss_smaller_than]
    df=df[df.gene_name.isin(loss_summary.gene_name)].reset_index(drop=True)
    return(df)

cd_path='cellDancer_estimation.csv'
cellDancer_df=pd.read_csv(cd_path)

# compute cell velocity, if already have, skip this step
cellDancer_df=cd.compute_cell_velocity(cellDancer_df=cellDancer_df, 
                                       projection_neighbor_choice='gene', 
                                       expression_scale='power10', 
                                       projection_neighbor_size=10, 
                                       speed_up=(100,100))

# estimate pseudotime, if already have, skip this step
import random
# set parameters
dt = 0.05
t_total = {dt:int(10/dt)}
n_repeats = 10

cellDancer_df = cd.pseudo_time(cellDancer_df=cellDancer_df,
                               grid=(30,30),
                               dt=dt,
                               t_total=t_total[dt],
                               n_repeats=n_repeats,
                               speed_up=(100,100),
                               n_paths = 3,
                               # plot_long_trajs=True,
                               psrng_seeds_diffusion=[i for i in range(n_repeats)],
                               n_jobs=8)

# plt loss density
loss_sum=plt_loss_density(cellDancer_df)

# filter by loss of gene
cellDancer_df_filtered=filter_gene_by_loss(cellDancer_df,
                                           loss_smaller_than=0.2)
loss_sum_filtered_gene=plt_loss_density(cellDancer_df_filtered)

Then, you can obtain a heatmap of splice reads as the code shown below.

def heatmap_by_time(df,heatmap_value='splice'):
    onegene=df[df.gene_name==df.gene_name[0]]

    pseudotime_sort=onegene[['cellIndex','pseudotime']].sort_values('pseudotime')

    cellDancer_df_splice_heat=df.pivot(index='gene_name',columns='cellIndex',values=heatmap_value)
    cellDancer_df_splice_heat_by_time=cellDancer_df_splice_heat[pseudotime_sort.cellIndex]
    return(cellDancer_df_splice_heat_by_time)

splice_heatmap=heatmap_by_time(cellDancer_df_filtered,heatmap_value='splice')
splice_heatmap.to_csv('splice_heatmap.csv')# https://software.broadinstitute.org/morpheus/ or any tool you like could be used for further analysis.

Another way to obtain genes of interest as shown in Fig. 2E is to filter by the r square between the expression vs pseudotime and the kernel regression curve.

scVelo uses anndata as their data format. To transfer from pandas.DataFrame,to_dynamo() could be used to transfer.

Hope the answers will help you. Please don't hesitate to contact us if you have any questions. Thank you!

Abclisy commented 11 months ago

Hello, it's been a while since we heard back from you regarding this issue. We will be closing this issue. However, please don't hesitate to reopen it or create a new issue if you have further questions or concerns. Thank you for your understanding.

sup27606 commented 11 months ago

So sorry that I forgot to update here. Your instructions worked perfectly. Yes, this can be closed now. Thank you very much for your prompt response.