df2regulons: KeyError: 'TF'

dschrein commented 6 years ago

i have completed up through creation of the df object via:

with ProgressBar():
  df = prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME,client_or_address="dask_multiprocessing")

then I get this KeyError:

>>> regulons = df2regulons(df, NOMENCLATURE)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/pandas/indexes/base.py", line 2134, in get_loc
    return self._engine.get_loc(key)
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'TF'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/pyscenic/transform.py", line 301, in df2regulons
    COLUMN_NAME_TYPE]))))
  File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3991, in groupby
    **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/groupby.py", line 1511, in groupby
    return klass(obj, by, **kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/groupby.py", line 370, in __init__
    mutated=self.mutated)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/groupby.py", line 2462, in _get_grouper
    in_axis, name, gpr = True, gpr, obj[gpr]
  File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2059, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2066, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 3543, in get
    loc = self.items.get_loc(item)
  File "/usr/local/lib/python3.6/site-packages/pandas/indexes/base.py", line 2136, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'TF'

Here's what the df object looks like:

>>> df.head()
                                                          AUC  \
TF   MotifID                                                    
Ahr  dbcorrdb__BRCA1__ENCSR000EBX_1__m1              0.059173   
Arnt flyfactorsurvey__tgo_sima_SANGER_5_FBgn0015014  0.065220   
     cisbp__M5232                                    0.066279   
     cisbp__M5233                                    0.066347   
     flyfactorsurvey__ss_tgo_SANGER_10_FBgn0003513   0.069241   

                                                                                            Annotation  \
TF   MotifID                                                                                             
Ahr  dbcorrdb__BRCA1__ENCSR000EBX_1__m1              motif similar to transfac_public__M00139 ('V$A...   
Arnt flyfactorsurvey__tgo_sima_SANGER_5_FBgn0015014  motif is annotated for orthologous gene FBgn02...   
     cisbp__M5232                                    motif similar to flyfactorsurvey__tgo_sima_SAN...   
     cisbp__M5233                                    motif similar to flyfactorsurvey__tgo_sima_SAN...   
     flyfactorsurvey__ss_tgo_SANGER_10_FBgn0003513   motif similar to flyfactorsurvey__ss_tgo_SANGE...   

                                                                                               Context  \
TF   MotifID                                                                                             
Ahr  dbcorrdb__BRCA1__ENCSR000EBX_1__m1              (activating, mm9-500bp-upstream-10species, wei...   
Arnt flyfactorsurvey__tgo_sima_SANGER_5_FBgn0015014  (activating, mm9-500bp-upstream-10species, wei...   
     cisbp__M5232                                    (activating, mm9-500bp-upstream-10species, wei...   
     cisbp__M5233                                    (activating, mm9-500bp-upstream-10species, wei...   
     flyfactorsurvey__ss_tgo_SANGER_10_FBgn0003513   (activating, mm9-500bp-upstream-10species, wei...   

                                                     MotifSimilarityQvalue  \
TF   MotifID                                                                 
Ahr  dbcorrdb__BRCA1__ENCSR000EBX_1__m1                           0.000961   
Arnt flyfactorsurvey__tgo_sima_SANGER_5_FBgn0015014               0.000000   
     cisbp__M5232                                                 0.000001   
     cisbp__M5233                                                 0.000001   
     flyfactorsurvey__ss_tgo_SANGER_10_FBgn0003513                0.000000   

                                                          NES  \
TF   MotifID                                                    
Ahr  dbcorrdb__BRCA1__ENCSR000EBX_1__m1              3.290146   
Arnt flyfactorsurvey__tgo_sima_SANGER_5_FBgn0015014  3.021446   
     cisbp__M5232                                    3.109636   
     cisbp__M5233                                    3.115258   
     flyfactorsurvey__ss_tgo_SANGER_10_FBgn0003513   3.356153   

                                                     OrthologousIdentity  \
TF   MotifID                                                               
Ahr  dbcorrdb__BRCA1__ENCSR000EBX_1__m1                         1.000000   
Arnt flyfactorsurvey__tgo_sima_SANGER_5_FBgn0015014             0.479751   
     cisbp__M5232                                               0.479751   
     cisbp__M5233                                               0.479751   
     flyfactorsurvey__ss_tgo_SANGER_10_FBgn0003513              0.479751   

                                                     RankAtMax  \
TF   MotifID                                                     
Ahr  dbcorrdb__BRCA1__ENCSR000EBX_1__m1                   1293   
Arnt flyfactorsurvey__tgo_sima_SANGER_5_FBgn0015014        823   
     cisbp__M5232                                          876   
     cisbp__M5233                                          844   
     flyfactorsurvey__ss_tgo_SANGER_10_FBgn0003513        1395   

                                                                                           TargetGenes  \
TF   MotifID                                                                                             
Ahr  dbcorrdb__BRCA1__ENCSR000EBX_1__m1              [(March7, 0.390518704346), (Slc25a4, 0.8082596...   
Arnt flyfactorsurvey__tgo_sima_SANGER_5_FBgn0015014  [(Atp6v0b, 0.346331738619), (Snx2, 0.520586224...   
     cisbp__M5232                                    [(Atp6v0b, 0.346331738619), (Tra2b, 0.52058622...   
     cisbp__M5233                                    [(Atp6v0b, 0.346331738619), (Tra2b, 0.52058622...   
     flyfactorsurvey__ss_tgo_SANGER_10_FBgn0003513   [(Picalm, 0.346331738619), (Snx2, 0.5205862244...   

                                                           Type  
TF   MotifID                                                     
Ahr  dbcorrdb__BRCA1__ENCSR000EBX_1__m1              activating  
Arnt flyfactorsurvey__tgo_sima_SANGER_5_FBgn0015014  activating  
     cisbp__M5232                                    activating  
     cisbp__M5233                                    activating  
     flyfactorsurvey__ss_tgo_SANGER_10_FBgn0003513   activating

Any ideas? Thanks in advance!

bramvds commented 6 years ago

Dear,

Could you tell me the version of pandas you are using? It should be 0.20.1 or later. If not this would explain the error message.

Using the latest version of pandas, I tried to reproduce this error myself on the example dataset but did not succeed.

Thanks, Bram

dschrein commented 6 years ago

pandas==0.19.2

thanks - i will try after upgrading! ... fixed - thank you!

note that you have to use pd.read_pickle instead of pickle.load if you've dumped df using an older version of pandas.

grimwoo commented 4 years ago

@bramvds Hi, I had the same error information, but my pandas version is '0.25.3'. Could you please help me?

>>> pd.__version__
'0.25.3'

The code and error information:

>>>modules = list(modules_from_adjacencies(adjacencies, exprMat))
KeyError                                  Traceback (most recent call last)
~/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'TF'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-45-b1bf361ac831> in <module>
----> 1 modules = list(modules_from_adjacencies(adjacencies, exprMat))

~/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pyscenic/utils.py in modules_from_adjacencies(adjacencies, ex_mtx, thresholds, top_n_targets, top_n_regulators, min_genes, absolute_thresholds, rho_dichotomize, keep_only_activating, rho_threshold, rho_mask_dropouts)
    263         LOGGER.warn(f"Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI.\n\tDropout masking is currently set to [{rho_mask_dropouts}].")
    264         adjacencies = add_correlation(adjacencies, ex_mtx,
--> 265                                   rho_threshold=rho_threshold, mask_dropouts=rho_mask_dropouts)
    266         activating_modules = adjacencies[adjacencies[COLUMN_NAME_REGULATION] > 0.0]
    267         if keep_only_activating:

~/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pyscenic/utils.py in add_correlation(adjacencies, ex_mtx, rho_threshold, mask_dropouts)
    130         rhos = masked_rho4pairs(ex_mtx.values, col_idx_pairs, 0.0)
    131     else:
--> 132         genes = list(set(adjacencies[COLUMN_NAME_TF]).union(set(adjacencies[COLUMN_NAME_TARGET])))
    133         ex_mtx = ex_mtx[ex_mtx.columns[ex_mtx.columns.isin(genes)]]
    134         corr_mtx = pd.DataFrame(index=ex_mtx.columns, columns=ex_mtx.columns,

~/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2993             if self.columns.nlevels > 1:
   2994                 return self._getitem_multilevel(key)
-> 2995             indexer = self.columns.get_loc(key)
   2996             if is_integer(indexer):
   2997                 indexer = [indexer]

~/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'TF'

The code I changed according to the "Warning" and its error:

>>>modules = list(modules_from_adjacencies(adjacencies, exprMat, rho_mask_dropouts=True))
AttributeError                            Traceback (most recent call last)
<ipython-input-44-b6a5e2c5e1cf> in <module>
----> 1 modules = list(modules_from_adjacencies(adjacencies, exprMat, rho_mask_dropouts=True))

~/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pyscenic/utils.py in modules_from_adjacencies(adjacencies, ex_mtx, thresholds, top_n_targets, top_n_regulators, min_genes, absolute_thresholds, rho_dichotomize, keep_only_activating, rho_threshold, rho_mask_dropouts)
    263         LOGGER.warn(f"Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI.\n\tDropout masking is currently set to [{rho_mask_dropouts}].")
    264         adjacencies = add_correlation(adjacencies, ex_mtx,
--> 265                                   rho_threshold=rho_threshold, mask_dropouts=rho_mask_dropouts)
    266         activating_modules = adjacencies[adjacencies[COLUMN_NAME_REGULATION] > 0.0]
    267         if keep_only_activating:

~/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pyscenic/utils.py in add_correlation(adjacencies, ex_mtx, rho_threshold, mask_dropouts)
    127     if mask_dropouts:
    128         ex_mtx = ex_mtx.sort_index(axis=1)
--> 129         col_idx_pairs = _create_idx_pairs(adjacencies, ex_mtx)
    130         rhos = masked_rho4pairs(ex_mtx.values, col_idx_pairs, 0.0)
    131     else:

~/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pyscenic/utils.py in _create_idx_pairs(adjacencies, exp_mtx)
     68 
     69     # Create sorted list of genes that take part in a TF-target link.
---> 70     genes = set(adjacencies.TF).union(set(adjacencies.target))
     71     sorted_genes = sorted(genes)
     72 

~/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5177             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5178                 return self[name]
-> 5179             return object.__getattribute__(self, name)
   5180 
   5181     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'TF'

aertslab / pySCENIC

df2regulons: KeyError: 'TF' #8