Open moqri opened 3 years ago
It seems the error happens when include_meta=False
.
In this case, the delayed_meta_dfs
list remains empty while all_meta_df = from_delayed(delayed_meta_dfs, meta=_META_SCHEMA)
is called anyway
I got it working but changing the call to create_graph()
in algo.py
. It looks like this now:
graph = create_graph(expression_matrix,
gene_names,
tf_names,
client=client,
regressor_type=regressor_type,
regressor_kwargs=regressor_kwargs,
early_stop_window_length=early_stop_window_length,
limit=limit,
seed=seed,
include_meta=True)
Also, had to comment out print('{} partitions'.format(graph.npartitions))
Edited:
Just to add a little note on what @morqri said (for python idiots like me), go to the folder where arboteto is installed: e.g. /some_path/Anaconda3/env/env_name/Lib/site-packages/arboreto
manually modify the algo.py file as he describes. I also edited the core.py in a similar manner.
This ran but ended in new errors
I'll try to test this out but I wonder if it's due to a newer dask version than what I used with this release (was dask==2021.2.0
, distributed==2021.2.0
). Thanks for reporting in any case.
@cflerin thank you for the hint about versions. Yes my error-prone previous run was fresh install with dask/distributed versions: 2021.6.0.
I downgraded to version 2021.2.0 with:
conda uninstall dask distributed
conda install dask==2021.2.0 distributed==2021.2.0
Restarted the kernel and it now appears to be running correctly.
Thanks for the confirmation about the dask/distributed versions @Conorisco ! Now to figure out what changed since February...
Describe the bug Running
grnboost2(ex_matrix, tf_names=tf_names, verbose=True)
from the demo example results in errorSteps to reproduce the behavior
Command run when the error occurred:
Error encountered:
~/anaconda3/envs/pyscenic2/lib/python3.7/site-packages/arboreto/algo.py in grnboost2(expression_data, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose) 39 return diy(expression_data=expression_data, regressor_type='GBM', regressor_kwargs=SGBM_KWARGS, 40 gene_names=gene_names, tf_names=tf_names, client_or_address=client_or_address, ---> 41 early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose) 42 43
~/anaconda3/envs/pyscenic2/lib/python3.7/site-packages/arboreto/algo.py in diy(expression_data, regressor_type, regressor_kwargs, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose) 126 early_stop_window_length=early_stop_window_length, 127 limit=limit, --> 128 seed=seed) 129 130 if verbose:
~/anaconda3/envs/pyscenic2/lib/python3.7/site-packages/arboreto/core.py in create_graph(expression_matrix, gene_names, tf_names, regressor_type, regressor_kwargs, client, target_genes, limit, include_meta, early_stop_window_length, repartition_multiplier, seed) 448 # gather the DataFrames into one distributed DataFrame 449 all_links_df = from_delayed(delayed_link_dfs, meta=_GRN_SCHEMA) --> 450 all_meta_df = from_delayed(delayed_meta_dfs, meta=_META_SCHEMA) 451 452 # optionally limit the number of resulting regulatory links, descending by top importance
~/anaconda3/envs/pyscenic2/lib/python3.7/site-packages/dask/dataframe/io/io.py in from_delayed(dfs, meta, divisions, prefix, verify_meta) 589 raise TypeError("Expected Delayed object, got %s" % type(df).name) 590 --> 591 parent_meta = delayed(make_meta)(dfs[0]).compute() 592 593 if meta is None:
IndexError: list index out of range
Package Version
aiohttp 3.7.4.post0 anndata 0.7.6 arboreto 0.1.6 async-timeout 3.0.1 attrs 21.2.0 backcall 0.2.0 bokeh 2.3.2 boltons 21.0.0 cached-property 1.5.2 certifi 2021.5.30 chardet 4.0.0 click 8.0.1 cloudpickle 1.6.0 ctxcore 0.1.1 cvxopt 1.2.5 cycler 0.10.0 cytoolz 0.11.0 dask 2021.6.0 decorator 4.4.2 dill 0.3.3 distributed 2021.6.0 dunamai 1.5.5 frozendict 2.0.2 fsspec 2021.5.0 get-version 3.0 h5py 3.2.1 HeapDict 1.0.1 idna 2.10 importlib-metadata 4.5.0 interlap 0.2.7 ipykernel 5.3.4 ipython 7.22.0 ipython-genutils 0.2.0 jedi 0.17.0 Jinja2 3.0.1 joblib 1.0.1 jupyter-client 6.1.12 jupyter-core 4.7.1 kiwisolver 1.3.1 legacy-api-wrap 1.2 llvmlite 0.36.0 locket 0.2.1 loompy 3.0.6 MarkupSafe 2.0.1 matplotlib 3.4.2 mkl-fft 1.3.0 mkl-random 1.2.1 mkl-service 2.3.0 module 0.0.4 msgpack 1.0.2 multidict 5.1.0 multiprocessing-on-dill 3.5.0a4 natsort 7.1.1 networkx 2.5.1 numba 0.53.1 numexpr 2.7.3 numpy 1.20.2 numpy-groupies 0.9.13 packaging 20.9 pandas 1.2.4 parso 0.8.2 partd 1.2.0 patsy 0.5.1 pexpect 4.8.0 pickleshare 0.7.5 Pillow 8.2.0 pip 21.1.1 prompt-toolkit 3.0.17 psutil 5.8.0 ptyprocess 0.7.0 pyarrow 0.16.0 Pygments 2.9.0 pynndescent 0.5.2 pyparsing 2.4.7 pyscenic 0.11.2 python-dateutil 2.8.1 pytz 2021.1 PyYAML 5.4.1 pyzmq 20.0.0 requests 2.25.1 scanpy 1.7.2 scikit-learn 0.24.2 scipy 1.6.3 seaborn 0.11.1 setuptools 52.0.0.post20210125 sinfo 0.3.4 six 1.15.0 sortedcontainers 2.4.0 statsmodels 0.11.1 stdlib-list 0.8.0 tables 3.6.1 tblib 1.7.0 threadpoolctl 2.1.0 toolz 0.11.1 tornado 6.1 tqdm 4.61.0 traitlets 5.0.5 typing-extensions 3.10.0.0 umap-learn 0.5.1 urllib3 1.26.5 wcwidth 0.2.5 wheel 0.36.2 xlrd 1.2.0 yarl 1.6.3 zict 2.0.0 zipp 3.4.1