dpeerlab / spectra

Supervised Pathway DEConvolution of InTerpretable Gene ProgRAms
MIT License
134 stars 17 forks source link

UnboundLocalError: local variable 'is_global' referenced before assignment #34

Open GabyBG opened 8 months ago

GabyBG commented 8 months ago

Hello, I am trying to run spectra using cell type labels:

import Spectra
import scanpy as sc
import pandas as pd
import numpy as np
import cytopus as cp

#subset my dataset from cytopus
G = cp.KnowledgeBase()
celltype_of_interest = ['T']
global_celltypes = ['all-cells','leukocyte']
G.get_celltype_processes(celltype_of_interest,global_celltypes = global_celltypes,get_children=True,get_parents =False)
annotations = G.celltype_process_dict
annotations = G.celltype_process_dict

#Run spectra
model = Spectra.est_spectra(
    adata=adata, 
    gene_set_dictionary=annotations, 
    use_highly_variable=True,
    cell_type_key="predicted.celltype.l1", 
    use_weights=True,
    lam=0.1, #varies depending on data and gene sets, try between 0.5 and 0.001
    delta=0.001, 
    kappa=None,
    rho=0.001, 
    use_cell_types=False,
    n_top_vals=50,
    label_factors=True, 
    overlap_threshold=0.2,
    clean_gs = True, 
    min_gs_num = 3,
    num_epochs=5000
)

It finishes the process, but gives the following error:

Cell type labels in gene set annotation dictionary and AnnData object are identical
removing gene set T for cell type global which is of length 14 0 genes are found in the data. minimum length is 3
removing gene set global for cell type global which is of length 150 0 genes are found in the data. minimum length is 3
Your gene set annotation dictionary is now correctly formatted.
/home/ubuntu/anaconda3/envs/scFates-gpu/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3464: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/home/ubuntu/anaconda3/envs/scFates-gpu/lib/python3.8/site-packages/numpy/core/_methods.py:192: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
100%|██████████████████████████████████████████████████████████████████████████████████| 5000/5000 [38:15<00:00,  2.18it/s]
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
Cell In[12], line 1
----> 1 model = Spectra.est_spectra(
      2     adata=adata, 
      3     gene_set_dictionary=annotations, 
      4     use_highly_variable=True,
      5     cell_type_key="predicted.celltype.l1", 
      6     use_weights=True,
      7     lam=0.1, #varies depending on data and gene sets, try between 0.5 and 0.001
      8     delta=0.001, 
      9     kappa=None,
     10     rho=0.001, 
     11     use_cell_types=False,
     12     n_top_vals=50,
     13     label_factors=True, 
     14     overlap_threshold=0.2,
     15     clean_gs = True, 
     16     min_gs_num = 3,
     17     num_epochs=5000
     18 )

File ~/anaconda3/envs/scFates-gpu/lib/python3.8/site-packages/Spectra/Spectra.py:1314, in est_spectra(adata, gene_set_dictionary, L, use_highly_variable, cell_type_key, use_weights, lam, delta, kappa, rho, use_cell_types, n_top_vals, filter_sets, label_factors, clean_gs, min_gs_num, overlap_threshold, **kwargs)
   1311 #labeling function
   1312 if label_factors:
   1313     #get cell type specificity of every factor
-> 1314     if is_global == False:
   1315         celltype_dict = get_factor_celltypes(adata, cell_type_key, cellscore=spectra.cell_scores)
   1316         max_celltype = [celltype_dict[x] for x in range(spectra.cell_scores.shape[1])]

UnboundLocalError: local variable 'is_global' referenced before assignment
Tobiaspk commented 8 months ago

Hi, it seems that the genes defined in the annotations variable are not found in your adata object, this causes the error that you're seeing. We'll implement a more verbose warning in the next patch, thanks for raising this issue.

To solve this, ensure that the gene names in adata.var_names match your annotations. Cytopus uses capital letters only for genes, without spaces or special characters (for example STAT6, RAB1A, ..). Please let me know if that helped.

ErikaZ95 commented 7 months ago

Hi. Even after doing as suggested it threw the same error at the end of the training. I then flattened the dictionary such that it has only one level and the training succeeded. However, I guess this is not the right way to do it as you want a hierarchy of cell type -> processes. (correct me if I am wrong)