dask / dask-yarn

Deploy dask on YARN clusters
http://yarn.dask.org
BSD 3-Clause "New" or "Revised" License
69 stars 41 forks source link

Issue with deprecated format_bytes #161

Open 5nomads opened 6 months ago

5nomads commented 6 months ago

I setup a new virtual environment and loaded Dask/Dask-Yarn and cannot import due to a deprecated package. I searched for Dask-Yarn and format_bytes and only found the deprecation warning in older versions. I tried both python3.9 and python3.11, I'll include steps for the python3.9 test. I'm not sure what to do about this. Should I pin Dask and Dask-Yarn to specific versions?

Producing the issue:

python -c "from dask_yarn import YarnCluster"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../venv_py39_dask_yarn/lib/python3.9/site-packages/dask_yarn/__init__.py", line 6, in <module>
    from .core import YarnCluster
  File ".../venv_py39_dask_yarn/lib/python3.9/site-packages/dask_yarn/core.py", line 16, in <module>
    from distributed.utils import (
ImportError: cannot import name 'format_bytes' from 'distributed.utils' (.../venv_py39_dask_yarn/lib/python3.9/site-packages/distributed/utils.py)

Possible fix is to import from dask.utils in the dask_yarn/core.py source instead:

from distributed.utils import (
    # format_bytes,            
    log_errors,                
    LoopRunner,                
    format_dashboard_link,     
    # parse_timedelta,         
    Log,                       
    Logs,                      
)                              
from dask.utils import (       
    format_bytes,              
    parse_timedelta,           
)                              

Setup Steps:

python3.9 -mvenv venv_py39_dask_yarn
cd venv_py39_dask_yarn
source bin/activate
pip install --upgrade pip
pip install dask[complete]
pip install dask-yarn

pip freeze:

$ pip freeze
bokeh==3.3.4
cffi==1.16.0
click==8.1.7
cloudpickle==3.0.0
contourpy==1.2.0
cryptography==42.0.5
dask==2024.2.1
dask-yarn==0.9
distributed==2024.2.1
fsspec==2024.2.0
grpcio==1.62.0
importlib-metadata==7.0.1
Jinja2==3.1.3
locket==1.0.0
lz4==4.3.3
MarkupSafe==2.1.5
msgpack==1.0.8
numpy==1.26.4
packaging==23.2
pandas==2.2.1
partd==1.4.1
pillow==10.2.0
protobuf==4.25.3
psutil==5.9.8
pyarrow==15.0.0
pyarrow-hotfix==0.6
pycparser==2.21
python-dateutil==2.9.0
pytz==2024.1
PyYAML==6.0.1
six==1.16.0
skein==0.8.2
sortedcontainers==2.4.0
tblib==3.0.0
toolz==0.12.1
tornado==6.4
tzdata==2024.1
urllib3==2.2.1
xyzservices==2023.10.1
zict==3.0.0
zipp==3.17.0

Python version verification

$ which python
.../venv_py39_dask_yarn/bin/python
$ python -V
Python 3.9.18
$ pip -V
pip 24.0 from .../venv_py39_dask_yarn/lib/python3.9/site-packages/pip (python 3.9)
sam-goodwin commented 6 months ago

Also running into this. Can confirm distributed="2023.8.0" works with dask-yarn = "^0.9".

This upper bound is wrong because 2024 is a breaking change: Should it be >=2021.1.0,<=2023.8.0?

https://github.com/dask/dask-yarn/blob/8eed5e2b5abd6a3b49c1fee3d68e45ecd972fdb2/requirements.txt#L2C14-L2C22

sam-goodwin commented 6 months ago

Got past that issue and can instantiate a YarnCluster but can't create a dask.distributed.Client:

AttributeError: 'YarnCluster' object has no attribute 'status'
sam-goodwin commented 6 months ago

Hmm, wasn't the original issue fixed with this commit?

https://github.com/dask/dask-yarn/commit/23a2932a6af9e49e9aacb43555979d878ea5b8f6

Was this never published to pypi?