aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
424 stars 179 forks source link

prune2df TypeError: Columns must be indices or names. #483

Open Marinnane opened 1 year ago

Marinnane commented 1 year ago

Hello,

I apologize for bothering you but I was wondering if you could help me understand what I might be doing wrong.

I am following the tutorial and defined: modules, adjacencies and dbs as follows.


exMatrix = pd.DataFrame(adata.X.T, index=adata.var_names, columns=list(sc.get.obs_df(adata).T))
adjacencies = grnboost2(adata.X,gene_names= adata.var_names, tf_names = list(tf_names),verbose=True)
adjacencies.head()

            TF   target  importance
    0    Sox14   CG9279   17.726219
    4     scrt     SKIP   10.349272
    5       en      tup   10.249785
    3   fd96Cb   CG9279   10.212465
    1     Hr38   CG9279    9.924212
    ..     ...      ...         ...
    0    Sox14  Actbeta    0.636025
    4     scrt  CR32773    0.512135
    5       en  CR32773    0.367211
    6      tup  CG10713    0.326409
    5       en      out    0.061801
modules = list(modules_from_adjacencies(adjacencies, exMatrix.T))
print(type(modules[0]))
print(modules[0])
    <class 'ctxcore.genesig.Regulon'>
    [CG9279,Myo95E,SKIP,CR41535,E(spl)malpha-BFM,CG6308,CG44004,mid,CR43857,CG6865,scrt,CG5909,CAP,CG31161,out,CG12159,CG14024,en,Hr38,Sox14]
from ctxcore.rnkdb import FeatherRankingDatabase as RankingDatabase
DATABASES_GLOB='C:/Users/admin/Desktop/tent_python_VSC/input/dm6_v10_clust.genes_vs_motifs.scores.feather'
db_fnames = glob.glob(DATABASES_GLOB)
def name(fname):
    return os.path.splitext(os.path.basename(fname))[0] 
dbs = [RankingDatabase(fname=fname, name=name(fname)) for fname in db_fnames]
print(dbs)

    [FeatherRankingDatabase(name="dm6_v10_clust.genes_vs_motifs.scores")]

However, when I try to run prune2df I get the following error

MOTIF_ANNOTATIONS_FNAME = "C:/Users/admin/Desktop/tent_python_VSC/input/motifs-v10nr_clust-nr.flybase-m0.001-o0.0.tbl"

with ProgressBar():
    df = prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME)
[                                        ] | 0% Completed | 14.22 sms
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[59], line 18
     14 MOTIF_ANNOTATIONS_FNAME = "[C:/Users/admin/Desktop/tent_python_VSC/input/motifs-v10nr_clust-nr.flybase-m0.001-o0.0.tbl](file:///C:/Users/admin/Desktop/tent_python_VSC/input/motifs-v10nr_clust-nr.flybase-m0.001-o0.0.tbl)"
     17 with ProgressBar():
---> 18     df = prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME) 

File [~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pyscenic\prune.py:424](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/admin/Desktop/tent_python_VSC/~/AppData/Local/Packages/PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0/LocalCache/local-packages/Python310/site-packages/pyscenic/prune.py:424), in prune2df(rnkdbs, modules, motif_annotations_fname, rank_threshold, auc_threshold, nes_threshold, motif_similarity_fdr, orthologuous_identity_threshold, weighted_recovery, client_or_address, num_workers, module_chunksize, filter_for_annotation)
    418 # Create a distributed dataframe from individual delayed objects to avoid out of memory problems.
    419 aggregation_func = (
    420     partial(from_delayed, meta=DF_META_DATA)
    421     if client_or_address != "custom_multiprocessing"
    422     else pd.concat
    423 )
--> 424 return _distributed_calc(
    425     rnkdbs,
    426     modules,
    427     motif_annotations_fname,
    428     transformation_func,
    429     aggregation_func,
    430     motif_similarity_fdr,
    431     orthologuous_identity_threshold,
    432     client_or_address,
    433     num_workers,
    434     module_chunksize,
...
    267                     .format(columns, column_type_names))
    269 # Feather v1 already respects the column selection
    270 if reader.version < 3:

TypeError: Columns must be indices or names. Got columns ('CAP', 'CG12159', 'CG14024', 'CG31161', 'CG44004', 'CG5909', 'CG6308', 'CG6865', 'CG9279', 'CR43857', 'E(spl)malpha-BFM', 'Hr38', 'Myo95E', 'SKIP', 'Sox14', 'en', 'mid', 'out', 'scrt', 'motifs') of types ['str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str_', 'str']

Would you be able to help me figure out what I am doing wrong.

Thank you very much!

Package                 Version
----------------------- --------
arboreto                0.1.6
compatibility           1.0.1
distributed             2023.6.0
h5py                    3.8.0
loompy                  3.0.7
louvain                 0.8.0
numba                   0.57.0
numpy                   1.23.4
numpy-groupies          0.9.22
pandas                  2.0.2
pyarrow                 12.0.0
pyscenic                0.12.1
PyYAML                  5.4.1
scanpy                  1.9.3
seaborn                 0.12.2
umap-learn              0.5.3
wheel                   0.40.0
ghuls commented 1 year ago

Try to use the commandline version: https://pyscenic.readthedocs.io/en/latest/installation.html#command-line-interface for which wraps the code in the tutorial in 3 simple steps.

Preferably use the containerized versions of pySCENIC: https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images

In the lab we never run pySCENIC on Windows (Linux only), so it is possible that not all steps would work properly on Windows.