capitalone / DataProfiler

What's in your data? Extract schema, statistics and entities from datasets
https://capitalone.github.io/DataProfiler
Apache License 2.0
1.39k stars 157 forks source link

Dask Max Version Tag #1121

Open taylorfturner opened 4 months ago

taylorfturner commented 4 months ago

General Information:

Describe the bug: Dask change a couple things subsequent to Feb 9, 2024. We have had to put a max pin on Dask version to 2024.2.0. Not pinning outputs the following error in the unit testing suite

____ ERROR collecting dataprofiler/tests/validators/test_base_validators.py ____
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/dask/dataframe/__init__.py:22: in _dask_expr_enabled
    import dask_expr  # noqa: F401
E   ModuleNotFoundError: No module named 'dask_expr'

During handling of the above exception, another exception occurred:
dataprofiler/tests/validators/test_base_validators.py:4: in <module>
    from dask import dataframe as dd
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/dask/dataframe/__init__.py:87: in <module>
    if _dask_expr_enabled():
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/dask/dataframe/__init__.py:24: in _dask_expr_enabled
    raise ValueError("Must install dask-expr to activate query planning.")
E   ValueError: Must install dask-expr to activate query planning.

To Reproduce: Don't pin max on dask in requirements-test.txt file

Expected behavior: No errors -- ideally with no max version pin on dask installation

taylorfturner commented 4 months ago

1120 fixing but ultimately will need to resolve to not have late vintage version pins

gliptak commented 1 month ago

1090

gliptak commented 1 month ago

Dask is current in dev https://github.com/capitalone/DataProfiler/blob/dev/requirements-test.txt#L2