StatBiomed / UniTVelo

UniTVelo, Temporally Unified RNA Velocity for single cell trajectory inference
https://unitvelo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
25 stars 9 forks source link

KeyError: 'highly_variable' #28

Closed eijynagai closed 1 year ago

eijynagai commented 1 year ago

Hello dev team! Thank you so much for developing such an amazing tool!

I'm trying to run unitvelo and got an error about "highly_variable". I checked the adata.var and there are highly variable genes there. Could you help figure out how to solve this issue?

What I did a step before is to integrate adata obtained from Seurat convertion with loom files with unspliced and spliced:

These are the variables in the adata

adata
AnnData object with n_obs × n_vars = 329 × 1807
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'RNA_snn_res.0.9', 'seurat_clusters', 'RNA_snn_res.1', 'RNA_snn_res.1.2', 'integrated_snn_res.0.8', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size'
    var: 'features', 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    uns: 'neighbors'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambiguous', 'matrix', 'spanning', 'spliced', 'unspliced'
    obsp: 'distances'

And I added the highly variable through the command: sc.pp.highly_variable_genes(adata)

adata
AnnData object with n_obs × n_vars = 329 × 1442
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'RNA_snn_res.0.9', 'seurat_clusters', 'RNA_snn_res.1', 'RNA_snn_res.1.2', 'integrated_snn_res.0.8', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts'
    var: 'features', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'neighbors', 'temp', 'hvg'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambiguous', 'matrix', 'spanning', 'spliced', 'unspliced'
    obsp: 'distances'

When I run the UniTVelo command, I get the following error

adata = utv.run_model("results/velocity_integrated.h5ad", label="seurat_clusters")
Model configuration file not specified. Default settings with unified-time mode will be used.
------> Manully Specified Parameters <------
------> Model Configuration Settings <------
N_TOP_GENES:    2000
LEARNING_RATE:  0.01
FIT_OPTION: 1
DENSITY:    SVD
REORDER_CELL:   Soft_Reorder
AGGREGATE_T:    True
R2_ADJUST:  True
GENE_PRIOR: None
VGENES: basic
IROOT:  None
--------------------------------------------

Filtered out 365 genes that are detected 20 counts (shared).
WARNING: Did not normalize X as it looks processed already. To enforce normalization, set `enforce=True`.
Normalized count data: spliced, unspliced.
Skip filtering by dispersion since number of variables are less than `n_top_genes`.
WARNING: Did not modify X as it looks preprocessed already.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/unitvelo/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

~/miniconda3/envs/unitvelo/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

~/miniconda3/envs/unitvelo/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'highly_variable'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_1863459/1623242212.py in <module>
----> 1 adata = utv.run_model("results/velocity_integrated.h5ad", label="seurat_clusters")
      2 #scv.pl.velocity_embedding_stream(adata)

~/miniconda3/envs/unitvelo/lib/python3.7/site-packages/unitvelo/main.py in run_model(adata, label, config_file, normalize)
     27     from .utils import init_config_summary, init_adata_and_logs
     28     config, _ = init_config_summary(config=config_file)
---> 29     adata, data_path = init_adata_and_logs(adata, config, normalize=normalize)
     30 
     31     scv.settings.presenter_view = True

~/miniconda3/envs/unitvelo/lib/python3.7/site-packages/unitvelo/utils.py in init_adata_and_logs(adata, config, normalize)
    605                                     min_shared_counts=config.MIN_SHARED_COUNTS,
    606                                     n_top_genes=config.N_TOP_GENES)
--> 607         print (f"Extracted {adata.var[adata.var['highly_variable'] == True].shape[0]} highly variable genes.")
    608 
    609         print (f'Computing moments for {len(adata.var)} genes with n_neighbors: {config.N_NEIGHBORS} and n_pcs: {config.N_PCS}')

~/miniconda3/envs/unitvelo/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3456             if self.columns.nlevels > 1:
   3457                 return self._getitem_multilevel(key)
-> 3458             indexer = self.columns.get_loc(key)
   3459             if is_integer(indexer):
   3460                 indexer = [indexer]

~/miniconda3/envs/unitvelo/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'highly_variable'

I appreciate your support in advance.

michaelgmz commented 1 year ago

Hi there @eijynagai ,

It looks like 'highly_variable' is not a column in adata.var. Seems strange because you do have that parameter in the adata with dimension of (329, 1442). I noticed you ran the model with path as input, could you double check if this adata is the same as the one you did with sc.pp.highly_variable_genes(adata). Or you could directly take adata object as input and see if the error persists.

Bests, Mingze

michaelgmz commented 1 year ago

Hi @eijynagai,

I also noticed you have around 1400 genes in adata, which is less than default 2k highly variable genes, please try reducing the parameter config.n_top_genes to say 1400 or 1300 and see if the problem is solved. Let me know if this doesn't work as well. :)

Bests, Mingze

eijynagai commented 1 year ago

Hi @michaelgmz,

Thanks very much for the lightning-speed response! Indeed, I forgot to point to the correct object with "highly_variable" in it after including it! Apologies for the newbie issue.

However, after that, I got another error.

adata = utv.run_model("results/velocity_integrated2.h5ad", label="seurat_clusters", config_file="config.py")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_1863459/673538557.py in <module>
----> 1 adata = utv.run_model("results/velocity_integrated2.h5ad", label="seurat_clusters", config_file="config.py")

~/miniconda3/envs/unitvelo/lib/python3.7/site-packages/unitvelo/main.py in run_model(adata, label, config_file, normalize)
     26 
     27     from .utils import init_config_summary, init_adata_and_logs
---> 28     config, _ = init_config_summary(config=config_file)
     29     adata, data_path = init_adata_and_logs(adata, config, normalize=normalize)
     30 

~/miniconda3/envs/unitvelo/lib/python3.7/site-packages/unitvelo/utils.py in init_config_summary(config)
    535         config = Configuration()
    536 
--> 537     if config.FIT_OPTION == '1':
    538         config.DENSITY = 'SVD' if config.GENE_PRIOR == None else 'Raw'
    539         config.REORDER_CELL = 'Soft_Reorder'

AttributeError: 'str' object has no attribute 'FIT_OPTION'

I tried modifying to 1 or 2 both config.py I indicate and the one in the conda package, but it doesn't work. Do you know what could it be? Thank you again.

michaelgmz commented 1 year ago

Hi @eijynagai,

The input of parameter 'config_file' should be the class within that file, not .py file itself.

Try the following code and see if it works, velo = utv.config.Configuration() velo.R2_ADJUST = True velo.FIT_OPTION = '1' adata = utv.run_model("results/velocity_integrated2.h5ad", label="seurat_clusters", config_file=velo)

Bests.

eijynagai commented 1 year ago

Hi @michaelgmz,

It's now running swiftly, problem solved! Thanks very much for your support!

Best regards,