None of the genes were predicted when running celldancer.velocity_estimation.velocity

kousaa commented 1 year ago

Dear cellDancer team,

When I run the celldancer.velocity_estimation.velocity function, I keep getting an empty celldancer_df. When the function finishes running this is the printed message "None of the genes were predicted. Try visualizing the unspliced and spliced columns of the gene(s) to check the quality". I have tried both sub-selecting genes or providing the top 2000 hvgs and played a bit with the parameters (e.g. setting the norm_u_s=False and norm_cell_distribution=False) without any lack. The spliced and unspliced columns look ok and I can run normal velocity analysis. Could you please advice how maybe I can relax the parameters to see if any of the genes will make it through? Any advice is welcome!

Best, Anastasia

kousaa commented 1 year ago

As an add-on to this issue, I have downloaded the PancreaticEndocrinogenesis_cell_type_u_s.csv tutorial file and run the following commands:

cell_type_u_s_path='PancreaticEndocrinogenesis_cell_type_u_s.csv' cell_type_u_s=pd.read_csv(cell_type_u_s_path) cell_type_u_s

cellDancer_df = cd.velocity(cell_type_u_s, permutation_ratio=0.5, n_jobs=7)

And I am still getting the same error.

I am attaching here the details of my conda environment for assistance. env.txt

Thanks again, Anastasia

Abclisy commented 1 year ago

Dear Anastasia,

Thank you for reaching out to us. I noticed that you are using Python v3.8.13. Is it possible for you to create a conda environment following the versions on our installation page? For the conda environment version on the installation page, we've verified the usage multiple times in different devices for all our case study data. If this problem still exist, could you maybe send us sample data with a subset of your gene following the code block below to let us test and replicate the problem (sli5@houstonmethodist.org)? Thank you!

# select genes by specifying genes
gene_list=['gene1','gene2','gene_3'] 
# or just select first n genes by 
n=10 # could be customized
gene_list=list(cell_type_u_s.gene_name.drop_duplicates())[0:n]

# keep the subset of data
cell_type_u_s_subset=cell_type_u_s[cell_type_u_s.gene_name.isin(gene_list)].reset_index(drop=True)
# save the csv file to current directory
cell_type_u_s_subset.to_csv('cell_type_u_s_subset.csv', index=False)

Best, Shengyu

kousaa commented 1 year ago

Dear Shengyu,

thanks so much for your response. I have managed to narrow down the error to a special character in my full_path of where the pipeline was running and trying to store the results of the cd.velocity function.

Once I included within the function a relevant path to the current directory, that completely resolved the issue.

Pasting this here as an example.

loss_df, cellDancer_df=cd.velocity(cell_type_u_s,\
                                   gene_list,
                                   permutation_ratio=0.125,\
                                   norm_cell_distribution=False,     
                                   n_jobs=8,
                                   save_path='./')

Best, Anastasia

GuangyuWangLab2021 / cellDancer

None of the genes were predicted when running celldancer.velocity_estimation.velocity #9