aertslab / arboreto

A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
BSD 3-Clause "New" or "Revised" License
50 stars 24 forks source link

TypeError from dask_expr/io/_delayed.py when running `grnboost2` #38

Closed himoto closed 5 months ago

himoto commented 6 months ago

When running the Example 01, I got the following error message at the cell [9] where grnboost2 is executed.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File <timed exec>:1

File [~/miniforge3/envs/arboreto-env/lib/python3.10/site-packages/arboreto/algo.py:39](http://localhost:8889/~/miniforge3/envs/arboreto-env/lib/python3.10/site-packages/arboreto/algo.py#line=38), in grnboost2(expression_data, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose)
     10 def grnboost2(expression_data,
     11               gene_names=None,
     12               tf_names='all',
   (...)
     16               seed=None,
     17               verbose=False):
     18     """
     19     Launch arboreto with [GRNBoost2] profile.
     20 
   (...)
     36     :return: a pandas DataFrame['TF', 'target', 'importance'] representing the inferred gene regulatory links.
     37     """
---> 39     return diy(expression_data=expression_data, regressor_type='GBM', regressor_kwargs=SGBM_KWARGS,
     40                gene_names=gene_names, tf_names=tf_names, client_or_address=client_or_address,
     41                early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose)

File [~/miniforge3/envs/arboreto-env/lib/python3.10/site-packages/arboreto/algo.py:120](http://localhost:8889/~/miniforge3/envs/arboreto-env/lib/python3.10/site-packages/arboreto/algo.py#line=119), in diy(expression_data, regressor_type, regressor_kwargs, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose)
    117 if verbose:
    118     print('creating dask graph')
--> 120 graph = create_graph(expression_matrix,
    121                      gene_names,
    122                      tf_names,
    123                      client=client,
    124                      regressor_type=regressor_type,
    125                      regressor_kwargs=regressor_kwargs,
    126                      early_stop_window_length=early_stop_window_length,
    127                      limit=limit,
    128                      seed=seed)
    130 if verbose:
    131     print('{} partitions'.format(graph.npartitions))

File [~/miniforge3/envs/arboreto-env/lib/python3.10/site-packages/arboreto/core.py:450](http://localhost:8889/~/miniforge3/envs/arboreto-env/lib/python3.10/site-packages/arboreto/core.py#line=449), in create_graph(expression_matrix, gene_names, tf_names, regressor_type, regressor_kwargs, client, target_genes, limit, include_meta, early_stop_window_length, repartition_multiplier, seed)
    448 # gather the DataFrames into one distributed DataFrame
    449 all_links_df = from_delayed(delayed_link_dfs, meta=_GRN_SCHEMA)
--> 450 all_meta_df = from_delayed(delayed_meta_dfs, meta=_META_SCHEMA)
    452 # optionally limit the number of resulting regulatory links, descending by top importance
    453 if limit:

File [~/miniforge3/envs/arboreto-env/lib/python3.10/site-packages/dask_expr/io/_delayed.py:93](http://localhost:8889/~/miniforge3/envs/arboreto-env/lib/python3.10/site-packages/dask_expr/io/_delayed.py#line=92), in from_delayed(dfs, meta, divisions, verify_meta)
     90     dfs = [dfs]
     92 if len(dfs) == 0:
---> 93     raise TypeError("Must supply at least one delayed object")
     95 if meta is None:
     96     meta = delayed(make_meta)(dfs[0]).compute()

TypeError: Must supply at least one delayed object

I would be grateful if you could tell me how to resolve this.

rikfor commented 6 months ago

Same error here, have not found a solution yet. These are all the packages-versions I have installed:

aiohttp                   3.9.3
aiosignal                 1.3.1
anyio                     4.3.0
appnope                   0.1.4
arboreto                  0.1.6
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 2.4.1
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     23.2.0
Babel                     2.14.0
beautifulsoup4            4.12.3
bleach                    6.1.0
bokeh                     3.4.0
boltons                   23.1.1
Brotli                    1.1.0
cached-property           1.5.2
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
cloudpickle               3.0.0
comm                      0.2.2
contourpy                 1.2.0
ctxcore                   0.2.0
cycler                    0.12.1
cytoolz                   0.12.3
dask                      2024.3.1
dask-expr                 1.0.4
debugpy                   1.8.1
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.8
distributed               2024.3.1
entrypoints               0.4
exceptiongroup            1.2.0
executing                 2.0.1
fastjsonschema            2.19.1
fonttools                 4.50.0
fqdn                      1.5.1
frozendict                2.4.0
frozenlist                1.4.1
fsspec                    2024.3.1
h11                       0.14.0
h2                        4.1.0
h5py                      3.10.0
hpack                     4.0.0
httpcore                  1.0.4
httpx                     0.27.0
hyperframe                6.0.1
idna                      3.6
importlib_metadata        7.0.2
importlib_resources       6.3.2
interlap                  0.2.7
ipykernel                 6.29.3
ipython                   8.22.2
isoduration               20.11.0
jedi                      0.19.1
Jinja2                    3.1.3
joblib                    1.3.2
json5                     0.9.24
jsonpointer               2.4
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
jupyter_client            8.6.1
jupyter_core              5.7.2
jupyter-events            0.10.0
jupyter-lsp               2.2.4
jupyter_server            2.13.0
jupyter_server_terminals  0.5.3
jupyterlab                4.1.5
jupyterlab_pygments       0.3.0
jupyterlab_server         2.25.4
kiwisolver                1.4.5
llvmlite                  0.42.0
locket                    1.0.0
loompy                    3.0.7
lz4                       4.3.3
MarkupSafe                2.1.5
matplotlib                3.8.3
matplotlib-inline         0.1.6
mistune                   3.0.2
msgpack                   1.0.8
multidict                 6.0.5
multiprocessing_on_dill   3.5.0a4
munkres                   1.1.4
nbclient                  0.10.0
nbconvert                 7.16.2
nbformat                  5.10.3
nest_asyncio              1.6.0
networkx                  3.2.1
notebook_shim             0.2.4
numba                     0.59.1
numexpr                   2.9.0
numpy                     1.26.4
numpy-groupies            0.10.2
overrides                 7.7.0
packaging                 24.0
pandas                    2.2.1
pandocfilters             1.5.0
parso                     0.8.3
partd                     1.4.1
patsy                     0.5.6
pexpect                   4.9.0
pickleshare               0.7.5
pillow                    10.2.0
pip                       24.0
pkgutil_resolve_name      1.3.10
platformdirs              4.2.0
prometheus_client         0.20.0
prompt-toolkit            3.0.43
psutil                    5.9.8
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   15.0.2
pyarrow-hotfix            0.6
pycparser                 2.21
Pygments                  2.17.2
pynndescent               0.5.11
pyobjc-core               10.2
pyobjc-framework-Cocoa    10.2
pyparsing                 3.1.2
pyscenic                  0.12.1+2.geaf23eb
PySocks                   1.7.1
python-dateutil           2.9.0
python-json-logger        2.0.7
pytz                      2024.1
PyYAML                    6.0.1
pyzmq                     25.1.2
referencing               0.34.0
requests                  2.31.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rpds-py                   0.18.0
scikit-learn              1.4.1.post1
scikit-misc               0.2.0
scipy                     1.12.0
seaborn                   0.13.2
Send2Trash                1.8.2
setuptools                69.2.0
six                       1.16.0
sniffio                   1.3.1
sortedcontainers          2.4.0
soupsieve                 2.5
stack-data                0.6.2
statsmodels               0.14.1
tblib                     3.0.0
terminado                 0.18.1
threadpoolctl             3.4.0
tinycss2                  1.2.1
tomli                     2.0.1
toolz                     0.12.1
tornado                   6.4
tqdm                      4.66.2
traitlets                 5.14.2
types-python-dateutil     2.9.0.20240316
typing_extensions         4.10.0
typing-utils              0.1.0
tzdata                    2024.1
umap-learn                0.5.5
unicodedata2              15.1.0
uri-template              1.3.0
urllib3                   2.2.1
wcwidth                   0.2.13
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.7.0
wheel                     0.42.0
xyzservices               2023.10.1
yarl                      1.9.4
zict                      3.0.0
zipp                      3.17.0

OS: MacOS Monteret V.12.5.1

gennadyFauna commented 5 months ago

https://github.com/aertslab/arboreto/blob/2f475dca08f47a60acc2beb8dd897e77b7495ca4/arboreto/core.py#L450

It looks like this line is to blame. By default, include_meta is set to False, and delayed_meta_dfs = []. If this is the case in from_delayed, dask raises an error. This has been the behavior in dask-expr since the function was added in December:

https://github.com/dask/dask-expr/commit/814a4bfd5e1f1550feca4862cad0cde9ba5c7760

It's not obvious to me why the dask-expr method gets called when the dask one is requested.

https://github.com/aertslab/arboreto/blob/2f475dca08f47a60acc2beb8dd897e77b7495ca4/arboreto/core.py#L12

The dask one does not have this ValueError behavior, I think:

https://github.com/dask/dask/blob/b663dca0fa4ca4686b8c08f7cb30d11320012901/dask/dataframe/io/io.py#L586

gennadyFauna commented 5 months ago

Correction: https://github.com/dask/dask-expr says "This is the default backend for dask.DataFrame since version 2024.3.0" – which was released on March 12. So this is the intended behavior going forward.

The obvious solution is to move the offending line https://github.com/aertslab/arboreto/blob/2f475dca08f47a60acc2beb8dd897e77b7495ca4/arboreto/core.py#L450 into the if/else statement and only call it when include_meta is set to True.

ghuls commented 5 months ago

Fixed in master.

jr-leary7 commented 5 months ago

I'm actually still encountering this issue even after installing the most-updated versions of dask and distributed as well as the most recent version of arboreto from GitHub. Any further ideas on what could be causing it? The verbose output is below:

preparing dask client
parsing input
creating dask graph
shutting down client and local cluster
finished
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/blue/rbacher/j.leary/py_envs/scLANE_env2/lib/python3.10/site-packages/arboreto/algo.py", line 39, in grnboost2
    return diy(expression_data=expression_data, regressor_type='GBM', regressor_kwargs=SGBM_KWARGS,
  File "/blue/rbacher/j.leary/py_envs/scLANE_env2/lib/python3.10/site-packages/arboreto/algo.py", line 120, in diy
    graph = create_graph(expression_matrix,
  File "/blue/rbacher/j.leary/py_envs/scLANE_env2/lib/python3.10/site-packages/arboreto/core.py", line 450, in create_graph
    all_meta_df = from_delayed(delayed_meta_dfs, meta=_META_SCHEMA)
  File "/blue/rbacher/j.leary/py_envs/scLANE_env2/lib/python3.10/site-packages/dask_expr/io/_delayed.py", line 102, in from_delayed
    raise TypeError("Must supply at least one delayed object")
TypeError: Must supply at least one delayed object
gennadyFauna commented 5 months ago

@jr-leary7 I just went in and checked it and it works for me (dask 2024.3.1, dask-expr 1.0.5, so perhaps not the very latest versions, but I did pip install the GitHub version of arboreto). It looks like your code is throwing an error at line 450, which is empty in the updated version https://github.com/aertslab/arboreto/blob/79f916b0ea25c00989331b8db243826049c3d66c/arboreto/core.py#L450 but used to have the call to from_delayed, so I think it's just not updated with the changes to master yet. Could it be that the changes have not been propagated to your version? I did have to delete and reinstall arboreto to get the changes to take, so that ought to do it.

Somatic-pipeline commented 4 months ago

I came across this issue and got the most updated aboreto from Github and re-installed dask 2024.3.1, dask-expr 1.0.5 (I used to have the most updated versions). Another error message showed up.

Traceback (most recent call last):
  File "~/Scenic.py", line 11, in <module>
    from arboreto.algo import grnboost2
  File "~/.conda/envs/test6/lib/python3.11/site-packages/arboreto/algo.py", line 7, in <module>
    from arboreto.core import create_graph, SGBM_KWARGS, RF_KWARGS, EARLY_STOP_WINDOW_LENGTH
  File "~/.conda/envs/test6/lib/python3.11/site-packages/arboreto/core.py", line 12, in <module>
    from dask.dataframe import from_delayed
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/__init__.py", line 40, in <module>
    from dask.dataframe import backends, dispatch, methods, rolling
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/backends.py", line 15, in <module>
    from dask.dataframe.core import DataFrame, Index, Scalar, Series, _Frame
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/core.py", line 36, in <module>
    from dask.dataframe import methods
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/methods.py", line 34, in <module>
    from dask.dataframe.utils import is_dataframe_like, is_index_like, is_series_like
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/utils.py", line 20, in <module>
    from dask.dataframe import (  # noqa: F401 register pandas extension types
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/_dtypes.py", line 9, in <module>
    from dask.dataframe.extensions import make_array_nonempty, make_scalar
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/extensions.py", line 8, in <module>
    from dask.dataframe.accessor import (
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/accessor.py", line 126, in <module>
    class DatetimeAccessor(Accessor):
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/accessor.py", line 81, in __init_subclass__
    _bind_property(cls, pd_cls, attr, min_version)
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/accessor.py", line 35, in _bind_property
    setattr(cls, attr, property(derived_from(pd_cls, version=min_version)(func)))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/utils.py", line 987, in wrapper
    method.__doc__ = _derived_from(
                     ^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/utils.py", line 940, in _derived_from
    method_args = get_named_args(method)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/utils.py", line 701, in get_named_args
    s = inspect.signature(func)
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/inspect.py", line 3263, in signature
    return Signature.from_callable(obj, follow_wrapped=follow_wrapped,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/inspect.py", line 3011, in from_callable
    return _signature_from_callable(obj, sigcls=cls,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/inspect.py", line 2599, in _signature_from_callable
    call = _descriptor_get(call, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/inspect.py", line 2432, in _descriptor_get
    return get(descriptor, obj, type(obj))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: descriptor '__call__' for 'type' objects doesn't apply to a 'property' object
PedroMBarcelos commented 3 weeks ago

I came across this issue and got the most updated aboreto from Github and re-installed dask 2024.3.1, dask-expr 1.0.5 (I used to have the most updated versions). Another error message showed up.

Traceback (most recent call last):
  File "~/Scenic.py", line 11, in <module>
    from arboreto.algo import grnboost2
  File "~/.conda/envs/test6/lib/python3.11/site-packages/arboreto/algo.py", line 7, in <module>
    from arboreto.core import create_graph, SGBM_KWARGS, RF_KWARGS, EARLY_STOP_WINDOW_LENGTH
  File "~/.conda/envs/test6/lib/python3.11/site-packages/arboreto/core.py", line 12, in <module>
    from dask.dataframe import from_delayed
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/__init__.py", line 40, in <module>
    from dask.dataframe import backends, dispatch, methods, rolling
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/backends.py", line 15, in <module>
    from dask.dataframe.core import DataFrame, Index, Scalar, Series, _Frame
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/core.py", line 36, in <module>
    from dask.dataframe import methods
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/methods.py", line 34, in <module>
    from dask.dataframe.utils import is_dataframe_like, is_index_like, is_series_like
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/utils.py", line 20, in <module>
    from dask.dataframe import (  # noqa: F401 register pandas extension types
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/_dtypes.py", line 9, in <module>
    from dask.dataframe.extensions import make_array_nonempty, make_scalar
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/extensions.py", line 8, in <module>
    from dask.dataframe.accessor import (
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/accessor.py", line 126, in <module>
    class DatetimeAccessor(Accessor):
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/accessor.py", line 81, in __init_subclass__
    _bind_property(cls, pd_cls, attr, min_version)
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/dataframe/accessor.py", line 35, in _bind_property
    setattr(cls, attr, property(derived_from(pd_cls, version=min_version)(func)))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/utils.py", line 987, in wrapper
    method.__doc__ = _derived_from(
                     ^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/utils.py", line 940, in _derived_from
    method_args = get_named_args(method)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/site-packages/dask/utils.py", line 701, in get_named_args
    s = inspect.signature(func)
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/inspect.py", line 3263, in signature
    return Signature.from_callable(obj, follow_wrapped=follow_wrapped,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/inspect.py", line 3011, in from_callable
    return _signature_from_callable(obj, sigcls=cls,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/inspect.py", line 2599, in _signature_from_callable
    call = _descriptor_get(call, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.conda/envs/test6/lib/python3.11/inspect.py", line 2432, in _descriptor_get
    return get(descriptor, obj, type(obj))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: descriptor '__call__' for 'type' objects doesn't apply to a 'property' object

Did you find a solution? I have the same problem and it seems to be a python version problem.