GuangyuWangLab2021 / cellDancer

Predict RNA velocity through deep learning
https://guangyuwanglab2021.github.io/cellDancer_website/
BSD 3-Clause "New" or "Revised" License
60 stars 11 forks source link

cd.pseudo_time: IndexError: list index out of range (celldancer 1.1.7, python 3.7.2) #13

Closed aghr closed 7 months ago

aghr commented 1 year ago

Dear CellDancer team,

Following the instructions I ran into an error after estimation of RNA velocity when running function cd.pseudo_time.

The call is:

# estimate pseudotime
cellDancer_df = cd.pseudo_time(cellDancer_df=cellDancer_df,
                               grid=(30,30),
                               dt=dt,
                               t_total=t_total[dt],
                               n_repeats=n_repeats,
                               speed_up=(100,100),
                               n_paths = 3,
                               plot_long_trajs=True,
                               psrng_seeds_diffusion=[i for i in range(n_repeats)],
                               n_jobs=1)

The output and error is:

Pseudo random number generator seeds are set to:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Generating Trajectories: 100%|███████████████████████████████████████████████████████████████████| 11550/11550 [00:04<00:00, 2346.30it/s]
There are 4 clusters.
[0 1 2 3]
level is 0
level is 0
level is 0
level is 0
level is 1
Traceback (most recent call last):
  File "celldancer_script.py", line 189, in <module>
    n_jobs=1)
  File "XXX/conda_envs/celldancer/lib/python3.7/site-packages/celldancer/pseudo_time.py", line 1333, in pseudo_time
    psrng_seeds_diffusion=psrng_seeds_diffusion)
  File "XXX/conda_envs/celldancer/lib/python3.7/site-packages/celldancer/pseudo_time.py", line 1044, in compute_cell_time 
    scell = cell_time_subclusters[j]
IndexError: list index out of range

I attache the YML in txt format of the conda environment with all the versions of packages/libraries: celldancer_scvelo.txt. celldancer_scvelo.txt

Abclisy commented 1 year ago

Hello, thank you for reaching out to us. Are you using your own data or our sample data mentioned in our notebook?

Abclisy commented 1 year ago

If you are using your own data, is it possible to send us your data to repeat your error? If you have concerns about data privacy, the codes below could be used to protect data privacy. The data could be sent to sli5@houstonmethodist.org. Thank you!

def replace_col(df,col_name):
    df=df.copy()
    col_uniq_lst=list(df[col_name].drop_duplicates())
    # build replace dict
    replace_dict=dict()
    for i,gene in enumerate(col_uniq_lst):
        replace_dict[gene]=col_name+str(i)

    df.loc[:,col_name]=df[col_name].map(replace_dict)
    return(df.copy())

# cellDancer_df is the original result
cellDancer_df_new=replace_col(cellDancer_df,'clusters')
cellDancer_df_new=replace_col(cellDancer_df_new,'gene_name')
cellDancer_df_new=replace_col(cellDancer_df_new,'cellID')

cellDancer_df_new.to_csv('cellDancer_df_new.csv')
aghr commented 1 year ago

Hi,

Thank you very much for the code lines. The gzipped dataframe can be downloaded here:

https://cloud.mpi-cbg.de/index.php/s/WAEmWnX0py6CNOp

The code I used is:

import os
import sys
import glob
import shutil
import scanpy
import scvelo
import pandas as pd
import math
import matplotlib.pyplot as plt
import celldancer as cd
import celldancer.utilities as cdutil
import celldancer.cdplt as cdplt
from celldancer.cdplt import colormap
import random

# Estimate pseudo time
# set parameters
dt = 0.05
t_total = {dt:int(10/dt)}
n_repeats = 10

cellDancer_df=pd.read_csv('cellDancer_with_velocity_df_2.csv')

# estimate pseudotime
cellDancer_df2 = cd.pseudo_time(cellDancer_df=cellDancer_df,
                               grid=(30,30),
                               dt=dt,
                               t_total=t_total[dt],
                               n_repeats=n_repeats,
                               speed_up=(100,100),
                               n_paths = 3,
                               plot_long_trajs=True,
                               psrng_seeds_diffusion=[i for i in range(n_repeats)],
                               n_jobs=8)

Looking forward to see if this issue can be solved. Many thanks.

Abclisy commented 1 year ago

Hello @aghr . Thank you for providing the sample data. We have updated the function of calculating pseudotime in the latest github version. We have tested with your data. Playing with n_repeats parameter as the code block below might also be helpful.

dt = 0.05
t_total = {dt:int(10/dt)}
n_repeats = 4

cellDancer_df = cd.pseudo_time(cellDancer_df=cellDancer_df,
                            grid=(30,30),
                            dt=dt,
                            t_total=t_total[dt],
                            n_repeats=n_repeats,
                            speed_up=(100,100),
                            n_paths = 3,
                            plot_long_trajs=True,
                            psrng_seeds_diffusion=[i for i in range(n_repeats)],
                            n_jobs=5)
biopzhang commented 7 months ago

Close this issue due to inactivity. Please feel free to reopen it with new updates.