andreariba / DeepCycle

Cell cycle inference in single-cell RNA-seq
https://www.nature.com/articles/s41467-022-30545-8
GNU General Public License v3.0
38 stars 7 forks source link

Various issues #18

Closed Thapeachydude closed 1 year ago

Thapeachydude commented 1 year ago

Good evening,

I wanted to give your tool a try, but I'm encountering various errors. I have run velocyto to get the spliced and unspliced counts and have filtered and normalized them with velociraptor, an R wrapper for scVelo - so that the results are consistent with my other analysis in R.

At the end I use zellkonverter to turn my SCE object into an anndata object and export it for DeepCycle. The anndata object looks like this:

AnnData object with n_obs × n_vars = 1174 × 5991
    obs: 'velocity_self_transition', 'root_cells', 'end_points', 'velocity_pseudotime', 'latent_time', 'velocity_length', 'velocity_confidence', 'velocity_confidence_transition'
    var: 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'X_name', 'neighbors', 'recover_dynamics', 'velocity_params', 'velocity_graph', 'velocity_graph_neg'
    obsm: 'X_pca', 'UMAP', 'X_umap'
    varm: 'loss'
    layers: 'spliced', 'unspliced', 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'velocity', 'velocity_u'
    obsp: 'distances', 'connectivities'

I tried finding cycling genes using the --hotelling flag, but I get this error:

Traceback (most recent call last):
  File "/home/mpohly/Code/DeepCycle/DeepCycle.py", line 335, in <module>
    pool_output = pool.map(process_gene, cell_cycle_genes)
  File "/home/mpohly/miniconda3/envs/LymphomaProject/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/mpohly/miniconda3/envs/LymphomaProject/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
numpy.linalg.LinAlgError: The data appears to lie in a lower-dimensional subspace of the space in which it is expressed. This has resulted in a singular data covariance matrix, which cannot be treated using the algorithms implemented in `gaussian_kde`. Consider performing principle component analysis / dimensionality reduction and using `gaussian_kde` with the transformed data.

Unfortunately, classical genes indicative of cycling (e.g. MKI67 or TOP2A) are very lowly expressed. So, when I use them as base_gene I get this error:

[N. OF USED GENES] 420
[Total number of cells]: 1174
[Number of cells used for training]: 975
[Number of cells used for validation]: 199
Traceback (most recent call last):
  File "/home/mpohly/Code/DeepCycle/DeepCycle.py", line 401, in <module>
    index_gene = genes.index(base_gene)
ValueError: 'ENSG00000131747' is not in list

Which I assume means they don't make it passed the expression threshold.

Any insights on how to deal with the hotelling issue would be very much appreciated. : )

Many thanks, M

andreariba commented 1 year ago

I can see that you're using ensembl gene annotation, be sure to use the go annotation of cell cycle genes is consistent

Thapeachydude commented 1 year ago

Thanks for the quick reply! I converted the GO_term gene list you provide with the package to their corresponding ensembl ID, prior to running it.

andreariba commented 1 year ago

ok then it means the gene you selected does not pass the filters, so i suggest you to choose one that passes the thresholds and then check better genes in the hotelling folder. let me know