aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
432 stars 180 forks source link

IndexError: list index out of range [BUG] #298

Open moqri opened 3 years ago

moqri commented 3 years ago

Describe the bug Running grnboost2(ex_matrix, tf_names=tf_names, verbose=True) from the demo example results in error

Steps to reproduce the behavior

  1. Command run when the error occurred:

    grnboost2(ex_matrix, tf_names=tf_names, verbose=True)
  2. Error encountered:

    
    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    <ipython-input-42-7c4f465355b3> in <module>
    ----> 1 adjacencies = grnboost2(ex_matrix, tf_names=tf_names, verbose=True)

~/anaconda3/envs/pyscenic2/lib/python3.7/site-packages/arboreto/algo.py in grnboost2(expression_data, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose) 39 return diy(expression_data=expression_data, regressor_type='GBM', regressor_kwargs=SGBM_KWARGS, 40 gene_names=gene_names, tf_names=tf_names, client_or_address=client_or_address, ---> 41 early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose) 42 43

~/anaconda3/envs/pyscenic2/lib/python3.7/site-packages/arboreto/algo.py in diy(expression_data, regressor_type, regressor_kwargs, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose) 126 early_stop_window_length=early_stop_window_length, 127 limit=limit, --> 128 seed=seed) 129 130 if verbose:

~/anaconda3/envs/pyscenic2/lib/python3.7/site-packages/arboreto/core.py in create_graph(expression_matrix, gene_names, tf_names, regressor_type, regressor_kwargs, client, target_genes, limit, include_meta, early_stop_window_length, repartition_multiplier, seed) 448 # gather the DataFrames into one distributed DataFrame 449 all_links_df = from_delayed(delayed_link_dfs, meta=_GRN_SCHEMA) --> 450 all_meta_df = from_delayed(delayed_meta_dfs, meta=_META_SCHEMA) 451 452 # optionally limit the number of resulting regulatory links, descending by top importance

~/anaconda3/envs/pyscenic2/lib/python3.7/site-packages/dask/dataframe/io/io.py in from_delayed(dfs, meta, divisions, prefix, verify_meta) 589 raise TypeError("Expected Delayed object, got %s" % type(df).name) 590 --> 591 parent_meta = delayed(make_meta)(dfs[0]).compute() 592 593 if meta is None:

IndexError: list index out of range


**Please complete the following information:**
- pySCENIC version: [0.11.2]
- Installation method: [pip]
- Run environment: [Jupyter notebook]
- OS: [Debian]
- Package versions: [obtain using `pip freeze`, `conda list`, or skip this if using Docker/Singularity]:
<!-- Put your package version list in this code block (if applicable, else delete the block): -->

Package Version


aiohttp 3.7.4.post0 anndata 0.7.6 arboreto 0.1.6 async-timeout 3.0.1 attrs 21.2.0 backcall 0.2.0 bokeh 2.3.2 boltons 21.0.0 cached-property 1.5.2 certifi 2021.5.30 chardet 4.0.0 click 8.0.1 cloudpickle 1.6.0 ctxcore 0.1.1 cvxopt 1.2.5 cycler 0.10.0 cytoolz 0.11.0 dask 2021.6.0 decorator 4.4.2 dill 0.3.3 distributed 2021.6.0 dunamai 1.5.5 frozendict 2.0.2 fsspec 2021.5.0 get-version 3.0 h5py 3.2.1 HeapDict 1.0.1 idna 2.10 importlib-metadata 4.5.0 interlap 0.2.7 ipykernel 5.3.4 ipython 7.22.0 ipython-genutils 0.2.0 jedi 0.17.0 Jinja2 3.0.1 joblib 1.0.1 jupyter-client 6.1.12 jupyter-core 4.7.1 kiwisolver 1.3.1 legacy-api-wrap 1.2 llvmlite 0.36.0 locket 0.2.1 loompy 3.0.6 MarkupSafe 2.0.1 matplotlib 3.4.2 mkl-fft 1.3.0 mkl-random 1.2.1 mkl-service 2.3.0 module 0.0.4 msgpack 1.0.2 multidict 5.1.0 multiprocessing-on-dill 3.5.0a4 natsort 7.1.1 networkx 2.5.1 numba 0.53.1 numexpr 2.7.3 numpy 1.20.2 numpy-groupies 0.9.13 packaging 20.9 pandas 1.2.4 parso 0.8.2 partd 1.2.0 patsy 0.5.1 pexpect 4.8.0 pickleshare 0.7.5 Pillow 8.2.0 pip 21.1.1 prompt-toolkit 3.0.17 psutil 5.8.0 ptyprocess 0.7.0 pyarrow 0.16.0 Pygments 2.9.0 pynndescent 0.5.2 pyparsing 2.4.7 pyscenic 0.11.2 python-dateutil 2.8.1 pytz 2021.1 PyYAML 5.4.1 pyzmq 20.0.0 requests 2.25.1 scanpy 1.7.2 scikit-learn 0.24.2 scipy 1.6.3 seaborn 0.11.1 setuptools 52.0.0.post20210125 sinfo 0.3.4 six 1.15.0 sortedcontainers 2.4.0 statsmodels 0.11.1 stdlib-list 0.8.0 tables 3.6.1 tblib 1.7.0 threadpoolctl 2.1.0 toolz 0.11.1 tornado 6.1 tqdm 4.61.0 traitlets 5.0.5 typing-extensions 3.10.0.0 umap-learn 0.5.1 urllib3 1.26.5 wcwidth 0.2.5 wheel 0.36.2 xlrd 1.2.0 yarl 1.6.3 zict 2.0.0 zipp 3.4.1

moqri commented 3 years ago

It seems the error happens when include_meta=False . In this case, the delayed_meta_dfs list remains empty while all_meta_df = from_delayed(delayed_meta_dfs, meta=_META_SCHEMA) is called anyway

moqri commented 3 years ago

I got it working but changing the call to create_graph() in algo.py . It looks like this now:

        graph = create_graph(expression_matrix,
                             gene_names,
                             tf_names,
                             client=client,
                             regressor_type=regressor_type,
                             regressor_kwargs=regressor_kwargs,
                             early_stop_window_length=early_stop_window_length,
                             limit=limit,
                             seed=seed,
                            include_meta=True)

Also, had to comment out print('{} partitions'.format(graph.npartitions))

Conorisco commented 3 years ago

Edited:

Just to add a little note on what @morqri said (for python idiots like me), go to the folder where arboteto is installed: e.g. /some_path/Anaconda3/env/env_name/Lib/site-packages/arboreto

manually modify the algo.py file as he describes. I also edited the core.py in a similar manner.

This ran but ended in new errors

cflerin commented 3 years ago

I'll try to test this out but I wonder if it's due to a newer dask version than what I used with this release (was dask==2021.2.0, distributed==2021.2.0). Thanks for reporting in any case.

Conorisco commented 3 years ago

@cflerin thank you for the hint about versions. Yes my error-prone previous run was fresh install with dask/distributed versions: 2021.6.0.

I downgraded to version 2021.2.0 with:

conda uninstall dask distributed conda install dask==2021.2.0 distributed==2021.2.0

Restarted the kernel and it now appears to be running correctly.

cflerin commented 3 years ago

Thanks for the confirmation about the dask/distributed versions @Conorisco ! Now to figure out what changed since February...