dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 719 forks source link

AttributeError: 'str' object has no attribute 'decode due to return pynvml.nvmlDeviceGetName(h).decode() in dask-scheduler #5768

Closed hovo1990 closed 1 year ago

hovo1990 commented 2 years ago

What happened:

When running "dask-scheduler" from inside a docker container, I get the following error:

(pyscience) i-am-curious@worker2:~/data$ dask-scheduler                 
distributed.scheduler - INFO - -----------------------------------------------
Traceback (most recent call last):
  File "/home/i-am-curious/.conda/envs/pyscience/bin/dask-scheduler", line 11, in <module>
    sys.exit(go())
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 217, in go
    main()
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 197, in main
    **kwargs,
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/distributed/scheduler.py", line 3958, in __init__
    **kwargs,
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/distributed/scheduler.py", line 2064, in __init__
    super().__init__(**kwargs)  # type: ignore
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/distributed/core.py", line 169, in __init__
    self.monitor = SystemMonitor()
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/distributed/system_monitor.py", line 59, in __init__
    gpu_extra = nvml.one_time()
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/distributed/diagnostics/nvml.py", line 140, in one_time
    "name": _get_name(h),
  File "/home/i-am-curious/.conda/envs/pyscience/lib/python3.7/site-packages/distributed/diagnostics/nvml.py", line 123, in _get_name
    return pynvml.nvmlDeviceGetName(h).decode()
AttributeError: 'str' object has no attribute 'decode'
(pyscience) i-am-curious@worker2:~/data$ 

Environment:

name: pyscience
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_llvm
  - _py-xgboost-mutex=2.0=cpu_0
  - _sysroot_linux-64_curr_repodata_hack=3=haa98f57_10
  - abseil-cpp=20210324.2=h2531618_0
  - aiohttp=3.8.1=py37h7f8727e_0
  - aiosignal=1.2.0=pyhd3eb1b0_0
  - alsa-lib=1.2.3=h516909a_0
  - ansi2html=1.5.2=py37h06a4308_0
  - ansiwrap=0.8.4=py_0
  - anyio=3.5.0=py37h89c1867_0
  - aplus=0.11.0=py_1
  - appdirs=1.4.4=pyhd3eb1b0_0
  - argon2-cffi=20.1.0=py37h27cfd23_1
  - arrow-cpp=6.0.1=py37he4d610e_8_cpu
  - arviz=0.11.2=pyhd3eb1b0_0
  - asn1crypto=1.4.0=py_0
  - astropy=4.3.1=py37h09021b7_0
  - astunparse=1.6.3=py_0
  - async-timeout=4.0.1=pyhd3eb1b0_0
  - async_generator=1.10=py37h28b3542_0
  - asynctest=0.13.0=py_0
  - attrs=21.4.0=pyhd3eb1b0_0
  - auto-sklearn=0.14.5=pyhd8ed1ab_0
  - aws-c-cal=0.5.11=h95a6274_0
  - aws-c-common=0.6.2=h27cfd23_0
  - aws-c-event-stream=0.2.7=h3541f99_13
  - aws-c-io=0.10.5=hfb6a706_0
  - aws-checksums=0.1.11=ha31a3da_7
  - aws-sdk-cpp=1.8.186=hb4091e7_3
  - babel=2.9.1=pyhd3eb1b0_0
  - backcall=0.2.0=pyhd3eb1b0_0
  - bcrypt=3.2.0=py37h7b6447c_0
  - beautifulsoup4=4.10.0=pyha770c72_0
  - bhmm=0.6.3=py37ha21ca33_1004
  - binutils=2.35.1=h9e65a1e_9
  - binutils_impl_linux-64=2.35.1=h27ae35d_9
  - binutils_linux-64=2.35.1=h454624a_30
  - biopandas=0.2.9=pyhd8ed1ab_0
  - biopython=1.79=py37h5e8e339_1
  - biotite=0.31.0=py37h5e8e339_0
  - black=22.1.0=pyhd8ed1ab_0
  - blake3=0.2.1=py37hfd0a3e1_0
  - blas=2.113=openblas
  - blas-devel=3.9.0=13_linux64_openblas
  - bleach=4.1.0=pyhd3eb1b0_0
  - blosc=1.21.0=h9c3ff4c_0
  - bokeh=2.4.2=py37h89c1867_0
  - boost=1.74.0=py37h796e4cb_5
  - boost-cpp=1.74.0=h312852a_4
  - boto3=1.20.24=pyhd3eb1b0_0
  - botocore=1.23.24=pyhd3eb1b0_0
  - bqplot=0.12.32=pyhd8ed1ab_0
  - branca=0.4.2=pyhd8ed1ab_0
  - bravado=11.0.3=pyhd8ed1ab_0
  - bravado-core=5.17.0=pyh9f0ad1d_0
  - brotli=1.0.9=he6710b0_2
  - brotli-python=1.0.9=py37heb0550a_2
  - brotlipy=0.7.0=py37h27cfd23_1003
  - brunsli=0.1=h2531618_0
  - bzip2=1.0.8=h7b6447c_0
  - c-ares=1.18.1=h7f8727e_0
  - c-blosc2=2.0.4=h5f21a17_1
  - c-compiler=1.2.0=h7f98852_0
  - ca-certificates=2021.10.26=h06a4308_2
  - cachecontrol=0.12.6=pyhd3eb1b0_0
  - cached-property=1.5.2=py_0
  - cachetools=4.2.2=pyhd3eb1b0_0
  - cairo=1.16.0=hf32fb01_1
  - catalogue=2.0.6=py37h06a4308_0
  - cattrs=1.10.0=pyhd8ed1ab_0
  - certifi=2021.10.8=py37h06a4308_2
  - cffi=1.15.0=py37hd667e15_1
  - cfitsio=3.470=hb418390_7
  - cftime=1.5.1.1=py37hce1f21e_0
  - charls=2.2.0=h2531618_0
  - charset-normalizer=2.0.4=pyhd3eb1b0_0
  - click=8.0.3=pyhd3eb1b0_0
  - cloudpickle=2.0.0=pyhd3eb1b0_0
  - codecov=2.1.11=pyhd3deb0d_0
  - colorama=0.4.4=pyhd3eb1b0_0
  - commonmark=0.9.1=pyhd3eb1b0_0
  - configspace=0.4.19=py37hce1f21e_0
  - coverage=6.2=py37h7f8727e_0
  - cramjam=2.5.0=py37hfd0a3e1_0
  - croniter=0.3.35=py_0
  - cryptography=36.0.0=py37h9ce1e76_0
  - curl=7.80.0=h7f8727e_0
  - cxx-compiler=1.2.0=h4bd325d_0
  - cycler=0.11.0=pyhd3eb1b0_0
  - cymem=2.0.5=py37h2531618_0
  - cython=0.29.25=py37hdbfa776_0
  - cython-blis=0.7.4=py37h27cfd23_1
  - cytoolz=0.11.0=py37h7b6447c_0
  - dash-bio=0.2.0=py37_0
  - dash-renderer=1.1.2=py_0
  - dask=2022.1.1=pyhd8ed1ab_0
  - dask-core=2022.1.1=pyhd8ed1ab_0
  - datatable=0.11.1=py37h6dcda5c_0
  - dbus=1.13.18=hb2f20db_0
  - deap=1.3.1=py37he8f5f7f_3
  - debugpy=1.5.1=py37h295c915_0
  - decorator=5.1.1=pyhd3eb1b0_0
  - deeptime=0.4.0=py37he8f5f7f_0
  - defusedxml=0.7.1=pyhd3eb1b0_0
  - dill=0.3.4=pyhd3eb1b0_0
  - distributed=2022.1.1=py37h89c1867_0
  - distro=1.5.0=pyhd3eb1b0_1
  - docker-py=4.4.1=py37h06a4308_5
  - docker-pycreds=0.4.0=pyhd3eb1b0_0
  - emcee=3.1.1=pyh8a188c0_0
  - entrypoints=0.3=py37_0
  - execnet=1.9.0=pyhd3eb1b0_0
  - expat=2.4.1=h2531618_2
  - fabric=2.6.0=pyhd8ed1ab_1
  - fastapi=0.70.1=pyhd8ed1ab_0
  - fastparquet=0.8.0=py37hb1e94ed_1
  - fastprogress=1.0.0=pyhb85f177_0
  - filelock=3.4.2=pyhd3eb1b0_0
  - flask=2.0.2=pyhd3eb1b0_0
  - flask-caching=1.10.1=pyhd8ed1ab_0
  - flask-compress=1.10.1=pyhd3eb1b0_0
  - fontconfig=2.13.94=ha180cfb_0
  - fonttools=4.25.0=pyhd3eb1b0_0
  - freetype=2.11.0=h70c0345_0
  - frozendict=2.3.0=py37h5e8e339_1
  - frozenlist=1.2.0=py37h7f8727e_0
  - fs=2.4.11=py37h89c1867_3
  - fsspec=2022.1.0=pyhd3eb1b0_0
  - future=0.18.2=py37_1
  - gcc_impl_linux-64=9.3.0=h6df7d76_17
  - gcc_linux-64=9.3.0=h1ee779e_30
  - gensim=4.1.2=py37hcd2ae1e_1
  - geos=3.8.0=he6710b0_0
  - gettext=0.21.0=hf68c758_0
  - gflags=2.2.2=he6710b0_0
  - giflib=5.2.1=h7b6447c_0
  - gitdb=4.0.7=pyhd3eb1b0_0
  - gitpython=3.1.18=pyhd3eb1b0_1
  - glew=2.1.0=h295c915_3
  - glib=2.68.2=h36276a3_0
  - glm=0.9.9.4=hc9558a2_0
  - glog=0.5.0=h2531618_0
  - gmp=6.2.1=h2531618_2
  - gmpy2=2.0.8=py37h10f8cd9_2
  - google-api-core=1.25.1=pyhd3eb1b0_0
  - google-auth=1.33.0=pyhd3eb1b0_0
  - google-cloud-core=1.7.1=pyhd3eb1b0_0
  - google-cloud-storage=1.41.0=pyhd3eb1b0_0
  - google-crc32c=1.1.2=py37h27cfd23_0
  - google-resumable-media=1.3.1=pyhd3eb1b0_1
  - googleapis-common-protos=1.53.0=py37h06a4308_0
  - greenlet=1.1.1=py37h295c915_0
  - griddataformats=0.6.0=pyhd8ed1ab_0
  - grpc-cpp=1.42.0=ha1441d3_1
  - grpcio=1.42.0=py37hce63b2e_0
  - gsd=2.5.1=py37hb1e94ed_0
  - gsl=2.7=he838d99_0
  - gst-plugins-base=1.18.4=hf529b03_2
  - gstreamer=1.18.4=h76c114f_2
  - gunicorn=20.1.0=py37h89c1867_1
  - gxx_impl_linux-64=9.3.0=hbdd7822_17
  - gxx_linux-64=9.3.0=h7e70986_30
  - h5py=3.6.0=nompi_py37hd308b1e_100
  - hdf4=4.2.15=h10796ff_3
  - hdf5=1.12.1=nompi_h2750804_103
  - hdmedians=0.14.2=py37h6323ea4_1
  - heapdict=1.0.1=pyhd3eb1b0_0
  - hypothesis=6.29.3=pyhd3eb1b0_0
  - icu=68.1=h2531618_0
  - idna=3.3=pyhd3eb1b0_0
  - imagecodecs=2021.8.26=py37hfe5a812_1
  - imageio=2.9.0=pyhd3eb1b0_0
  - importlib-metadata=4.8.2=py37h06a4308_0
  - importlib_metadata=4.8.2=hd3eb1b0_0
  - importlib_resources=5.2.0=pyhd3eb1b0_1
  - iniconfig=1.1.1=pyhd3eb1b0_0
  - invoke=1.6.0=pyhd8ed1ab_0
  - ipydatawidgets=4.2.0=pyhd3deb0d_0
  - ipykernel=6.8.0=py37h6531663_0
  - ipyleaflet=0.15.0=pyhd8ed1ab_3
  - ipympl=0.8.7=pyhd8ed1ab_0
  - ipython=7.31.1=py37h06a4308_0
  - ipython_genutils=0.2.0=py37_0
  - ipyvolume=0.6.0a8=pyhd8ed1ab_0
  - ipyvue=1.7.0=pyhd8ed1ab_0
  - ipyvuetify=1.8.1=pyhd8ed1ab_0
  - ipywebrtc=0.6.0=pyhd8ed1ab_0
  - ipywidgets=7.6.5=pyhd3eb1b0_1
  - isort=5.10.1=pyhd8ed1ab_0
  - itsdangerous=2.0.1=pyhd3eb1b0_0
  - jbig=2.1=hdba287a_0
  - jedi=0.18.1=py37h06a4308_0
  - jinja2=3.0.2=pyhd3eb1b0_0
  - jmespath=0.10.0=pyhd3eb1b0_0
  - joblib=1.1.0=pyhd8ed1ab_0
  - jpeg=9d=h7f8727e_0
  - json5=0.9.6=pyhd3eb1b0_0
  - jsonref=0.2=py_0
  - jsonschema=3.2.0=py37_1
  - jupyter-archive=3.2.1=pyhd8ed1ab_0
  - jupyter-dash=0.4.0=pyhd8ed1ab_0
  - jupyter-lsp=1.5.1=pyhd8ed1ab_0
  - jupyter-resource-usage=0.6.1=pyhd8ed1ab_0
  - jupyter-server-mathjax=0.2.3=pyhd8ed1ab_0
  - jupyter-server-proxy=3.2.1=pyhd8ed1ab_0
  - jupyter_client=7.1.2=pyhd3eb1b0_0
  - jupyter_contrib_core=0.3.3=py_2
  - jupyter_contrib_nbextensions=0.5.1=py37hc8dfbb8_1
  - jupyter_core=4.9.1=py37h06a4308_0
  - jupyter_highlight_selected_word=0.2.0=py37h89c1867_1005
  - jupyter_latex_envs=1.4.6=py37h89c1867_1001
  - jupyter_nbextensions_configurator=0.4.1=py37h89c1867_2
  - jupyter_server=1.4.1=py37h06a4308_0
  - jupyterlab=3.2.9=pyhd8ed1ab_0
  - jupyterlab-drawio=0.9.0=pyhd8ed1ab_0
  - jupyterlab-git=0.34.2=pyhd8ed1ab_0
  - jupyterlab-lsp=3.10.0=pyhd8ed1ab_0
  - jupyterlab-plotly-extension=1.0.0=py_0
  - jupyterlab-spellchecker=0.7.2=pyhd8ed1ab_0
  - jupyterlab_code_formatter=1.4.10=pyhd8ed1ab_1
  - jupyterlab_execute_time=2.1.0=pyhd8ed1ab_0
  - jupyterlab_pygments=0.1.2=py_0
  - jupyterlab_server=2.10.2=pyhd3eb1b0_1
  - jupyterlab_widgets=1.0.0=pyhd3eb1b0_1
  - jupyterthemes=0.20.0=py_1
  - jupytext=1.13.6=pyheef035f_0
  - jxrlib=1.1=h7b6447c_2
  - kernel-headers_linux-64=3.10.0=h57e8cba_10
  - kiwisolver=1.3.1=py37h2531618_0
  - kneed=0.7.0=pyh9f0ad1d_0
  - krb5=1.19.2=hac12032_0
  - langcodes=3.3.0=pyhd3eb1b0_0
  - lcms2=2.12=h3be6417_0
  - ld_impl_linux-64=2.35.1=h7274673_9
  - lerc=3.0=h295c915_0
  - lesscpy=0.15.0=pyhd8ed1ab_0
  - liac-arff=2.5.0=pyhd3eb1b0_1
  - libaec=1.0.6=h9c3ff4c_0
  - libblas=3.9.0=13_linux64_openblas
  - libbrotlicommon=1.0.9=h7f98852_6
  - libbrotlidec=1.0.9=h7f98852_6
  - libbrotlienc=1.0.9=h7f98852_6
  - libcblas=3.9.0=13_linux64_openblas
  - libcrc32c=1.1.1=he6710b0_2
  - libcurl=7.80.0=h0b77cf5_0
  - libdeflate=1.8=h7f8727e_5
  - libedit=3.1.20210910=h7f8727e_0
  - libev=4.33=h7f8727e_1
  - libevent=2.1.10=h9b69904_4
  - libffi=3.3=he6710b0_2
  - libgcc-devel_linux-64=9.3.0=hb95220a_17
  - libgcc-ng=11.2.0=h1d223b6_12
  - libgfortran-ng=11.2.0=h69a702a_12
  - libgfortran5=11.2.0=h5c6108e_12
  - libglib=2.68.2=h3e27bee_0
  - libglu=9.0.0=hf484d3e_1
  - libgomp=11.2.0=h1d223b6_12
  - libiconv=1.16=h516909a_0
  - liblapack=3.9.0=13_linux64_openblas
  - liblapacke=3.9.0=13_linux64_openblas
  - libllvm11=11.1.0=h3826bc1_0
  - libnetcdf=4.8.1=nompi_hb3fd0d9_101
  - libnghttp2=1.46.0=hce63b2e_0
  - libogg=1.3.5=h27cfd23_1
  - libopenblas=0.3.18=pthreads_h8fe5266_0
  - libopus=1.3.1=h7b6447c_0
  - libpng=1.6.37=hbc83047_0
  - libpq=13.5=hd57d9b9_1
  - libprotobuf=3.19.4=h780b84a_0
  - libsodium=1.0.18=h7b6447c_0
  - libssh2=1.9.0=h1ba5d50_1
  - libstdcxx-devel_linux-64=9.3.0=hf0c5c8d_17
  - libstdcxx-ng=11.2.0=he4da1e4_12
  - libthrift=0.15.0=hcc01f38_0
  - libtiff=4.3.0=h6f004c6_2
  - libutf8proc=2.7.0=h7f98852_0
  - libuuid=2.32.1=h7f98852_1000
  - libuv=1.42.0=h7f98852_0
  - libvorbis=1.3.7=h7b6447c_0
  - libwebp=1.2.0=h89dd481_0
  - libwebp-base=1.2.0=h27cfd23_0
  - libxcb=1.14=h7b6447c_0
  - libxgboost=1.5.0=h295c915_1
  - libxkbcommon=1.0.3=he3ba5ed_0
  - libxml2=2.9.12=h72842e0_0
  - libxslt=1.1.33=h15afd5d_2
  - libzip=1.8.0=h4de3113_1
  - libzlib=1.2.11=h36c2ea0_1013
  - libzopfli=1.0.3=he6710b0_0
  - llvm-openmp=12.0.1=h4bd325d_1
  - llvmlite=0.38.0=py37h9d7f4d0_0
  - locket=0.2.1=py37h06a4308_1
  - lockfile=0.12.2=py37h06a4308_0
  - lxml=4.7.1=py37h77fd288_0
  - lz4-c=1.9.3=h295c915_1
  - lzo=2.10=h7b6447c_2
  - markdown=3.3.4=py37h06a4308_0
  - markdown-it-py=1.1.0=pyhd8ed1ab_0
  - markupsafe=2.0.1=py37h27cfd23_0
  - marshmallow=3.12.2=pyhd3eb1b0_0
  - marshmallow-oneofschema=3.0.1=pyhd8ed1ab_0
  - matplotlib=3.5.1=py37h89c1867_0
  - matplotlib-base=3.5.1=py37h1058ff1_0
  - matplotlib-inline=0.1.2=pyhd3eb1b0_2
  - mdanalysis=2.0.0=py37hcd2ae1e_1
  - mdit-py-plugins=0.3.0=pyhd8ed1ab_0
  - mdtraj=1.9.7=py37h527cbdb_1
  - mistune=0.8.4=py37h14c3975_1001
  - mmligner=1.0.2=hc9558a2_0
  - mmtf-python=1.1.2=py_0
  - modin=0.12.1=py37h89c1867_0
  - modin-core=0.12.1=py37h89c1867_0
  - modin-dask=0.12.1=py37h89c1867_0
  - monotonic=1.5=py_0
  - mpc=1.1.0=h10f8cd9_1
  - mpfr=4.0.2=hb69a4c5_1
  - mpi=1.0=openmpi
  - mpi4py=3.0.3=py37hd955b32_1
  - mpmath=1.2.1=py37h06a4308_0
  - msgpack-python=1.0.2=py37hff7bd54_1
  - msmtools=1.2.6=py37hb1e94ed_1
  - multidict=5.2.0=py37h7f8727e_2
  - multiprocess=0.70.12.2=py37h7f8727e_0
  - munkres=1.1.4=py_0
  - murmurhash=1.0.5=py37h2531618_0
  - muscle=3.8.1551=h7d875b9_6
  - mypy_extensions=0.4.3=py37h06a4308_1
  - mysql-common=8.0.28=ha770c72_0
  - mysql-libs=8.0.28=hfa10184_0
  - natsort=7.1.1=pyhd3eb1b0_0
  - nbclassic=0.2.6=pyhd3eb1b0_0
  - nbclient=0.5.3=pyhd3eb1b0_0
  - nbconvert=6.3.0=py37h06a4308_0
  - nbdime=3.1.1=pyhd8ed1ab_0
  - nbformat=5.1.3=pyhd3eb1b0_0
  - nbval=0.9.6=pyh9f0ad1d_0
  - ncurses=6.3=h7f8727e_2
  - nest-asyncio=1.5.1=pyhd3eb1b0_0
  - netcdf4=1.5.7=py37ha0f2276_1
  - networkx=2.6.3=pyhd8ed1ab_1
  - nglview=3.0.3=pyh8a188c0_0
  - nltk=3.6.7=pyhd8ed1ab_0
  - nodejs=16.12.0=h92b4a50_0
  - nomkl=1.0=h5ca1d4c_0
  - notebook=6.4.8=pyha770c72_0
  - nspr=4.33=h295c915_0
  - nss=3.74=hb5efdd6_0
  - numba=0.55.1=py37h2d894fd_0
  - numexpr=2.8.1=py37hecfb737_0
  - numpy=1.21.5=py37hf2998dd_0
  - olefile=0.46=py37_0
  - openbabel=3.1.1=py37h6aa62a1_3
  - openblas=0.3.18=pthreads_h4748800_0
  - openjdk=8.0.312=h7f98852_0
  - openjpeg=2.4.0=h3ad879b_0
  - openmpi=4.0.5=h9b22176_4
  - openssl=1.1.1m=h7f8727e_0
  - orc=1.7.2=h1be678f_0
  - packaging=21.3=pyhd3eb1b0_0
  - pandas=1.3.5=py37he8f5f7f_0
  - pandera=0.6.5=pyhd8ed1ab_0
  - pandocfilters=1.5.0=pyhd3eb1b0_0
  - panel=0.12.6=pyhd8ed1ab_0
  - papermill=2.3.4=pyhd8ed1ab_0
  - param=1.12.0=pyhd3eb1b0_0
  - paramiko=2.9.2=pyhd8ed1ab_0
  - parquet-cpp=1.5.1=h34088ae_4
  - parso=0.8.3=pyhd3eb1b0_0
  - partd=1.2.0=pyhd3eb1b0_0
  - pathlib2=2.3.6=py37h06a4308_2
  - pathos=0.2.8=pyhd8ed1ab_0
  - pathspec=0.9.0=pyhd8ed1ab_0
  - pathy=0.6.0=pyhd3eb1b0_0
  - patsy=0.5.2=py37h06a4308_0
  - pcre=8.45=h295c915_0
  - pendulum=2.1.2=py37hc8dfbb8_0
  - pexpect=4.8.0=py37_1
  - pickleshare=0.7.5=py37_1001
  - pillow=8.4.0=py37h5aabda8_0
  - pip=21.2.2=py37h06a4308_0
  - pixman=0.40.0=h7f8727e_1
  - platformdirs=2.4.0=pyhd3eb1b0_0
  - plip=2.2.2=pyhd8ed1ab_0
  - plotly=5.5.0=pyhd8ed1ab_0
  - pluggy=1.0.0=py37h06a4308_0
  - ply=3.11=py37_0
  - pox=0.3.0=pyhd8ed1ab_0
  - ppft=1.6.6.4=pyhd8ed1ab_0
  - prefect=0.15.13=pyhd8ed1ab_0
  - preshed=3.0.5=py37h2531618_4
  - progressbar2=3.37.1=py37h06a4308_0
  - prometheus_client=0.13.1=pyhd3eb1b0_0
  - prompt-toolkit=3.0.20=pyhd3eb1b0_0
  - protobuf=3.19.4=py37hcd2ae1e_0
  - psutil=5.8.0=py37h27cfd23_1
  - ptyprocess=0.7.0=pyhd3eb1b0_2
  - py=1.10.0=pyhd3eb1b0_0
  - py4j=0.10.9.3=pyhd8ed1ab_1
  - pyarrow=6.0.1=py37h20dbb2a_8_cpu
  - pyasn1=0.4.8=pyhd3eb1b0_0
  - pyasn1-modules=0.2.8=py_0
  - pycairo=1.19.1=py37h708ec4a_0
  - pycparser=2.21=pyhd3eb1b0_0
  - pyct=0.4.8=py37_0
  - pycurl=7.44.1=py37h88a64d2_1
  - pydantic=1.8.2=py37h7f8727e_0
  - pyemma=2.5.10=py37he8f5f7f_0
  - pyerfa=2.0.0=py37h27cfd23_0
  - pygments=2.10.0=pyhd3eb1b0_0
  - pymc3=3.11.4=py37h249fa81_1
  - pymol-open-source=2.4.0=py37h84945d2_3
  - pynacl=1.4.0=py37h7b6447c_1
  - pynisher=0.6.4=pyhd8ed1ab_0
  - pyopenssl=22.0.0=pyhd3eb1b0_0
  - pyparsing=3.0.4=pyhd3eb1b0_0
  - pyqt=5.12.3=py37h89c1867_8
  - pyqt-impl=5.12.3=py37hac37412_8
  - pyqt5-sip=4.19.18=py37hcd2ae1e_8
  - pyqtchart=5.12=py37he336c9b_8
  - pyqtwebengine=5.12.1=py37he336c9b_8
  - pyrfr=0.8.2=py37h2527ec5_1
  - pyrsistent=0.18.0=py37heee7806_0
  - pysocks=1.7.1=py37_1
  - pyspark=3.2.1=pyhd8ed1ab_0
  - pytables=3.7.0=py37h5dea08b_0
  - pytest=7.0.0=py37h89c1867_0
  - pytest-cov=3.0.0=pyhd8ed1ab_0
  - pytest-forked=1.3.0=pyhd3eb1b0_0
  - pytest-runner=5.3.1=pyhd3eb1b0_0
  - pytest-xdist=2.5.0=pyhd8ed1ab_0
  - python=3.7.11=h12debd9_0
  - python-box=5.4.1=pyhd8ed1ab_0
  - python-dateutil=2.8.2=pyhd3eb1b0_0
  - python-slugify=5.0.2=pyhd3eb1b0_0
  - python-utils=2.5.6=py37h06a4308_0
  - python_abi=3.7=2_cp37m
  - pythreejs=2.3.0=pyhd8ed1ab_0
  - pytz=2021.3=pyhd3eb1b0_0
  - pytzdata=2020.1=pyhd3eb1b0_0
  - pyviz_comms=2.0.2=pyhd3eb1b0_0
  - pywavelets=1.1.1=py37h7b6447c_2
  - pyyaml=6.0=py37h7f8727e_1
  - pyzmq=22.3.0=py37h295c915_2
  - qt=5.12.9=hda022c4_4
  - rdkit=2021.09.4=py37h13c2175_0
  - re2=2021.11.01=h9c3ff4c_0
  - readline=8.1.2=h7f8727e_1
  - redo=2.0.4=pyh9f0ad1d_0
  - regex=2021.11.2=py37h7f8727e_0
  - reportlab=3.5.67=py37hfdd840d_1
  - requests=2.27.1=pyhd8ed1ab_0
  - retrying=1.3.3=py37_2
  - rich=10.16.2=pyhd3eb1b0_0
  - rsa=4.7.2=pyhd3eb1b0_1
  - ruamel.yaml=0.16.12=py37h7b6447c_1
  - ruamel.yaml.clib=0.2.6=py37h7f8727e_0
  - s2n=1.0.10=h9b69904_0
  - s3transfer=0.5.0=pyhd3eb1b0_0
  - schwimmbad=0.3.2=py37h89c1867_1
  - scikit-bio=0.5.6=py37ha21ca33_4
  - scikit-image=0.18.3=py37h51133e4_0
  - scikit-learn=0.24.2=py37hf0f1638_1
  - scipy=1.7.3=py37hf2a6cf1_0
  - seaborn=0.11.2=hd8ed1ab_0
  - seaborn-base=0.11.2=pyhd8ed1ab_0
  - semver=2.13.0=pyhd3eb1b0_0
  - send2trash=1.8.0=pyhd3eb1b0_1
  - setuptools=58.0.4=py37h06a4308_0
  - shapely=1.7.1=py37h1728cc4_0
  - shellingham=1.3.1=pyhd3eb1b0_0
  - shyaml=0.6.2=pyhd3deb0d_0
  - simpervisor=0.4=pyhd8ed1ab_0
  - simplejson=3.17.6=py37h7f8727e_0
  - six=1.16.0=pyhd3eb1b0_0
  - smac=1.1=pyhd8ed1ab_0
  - smart_open=5.1.0=pyhd3eb1b0_0
  - smmap=4.0.0=pyhd3eb1b0_0
  - snappy=1.1.8=he6710b0_0
  - sniffio=1.2.0=py37h06a4308_1
  - sortedcontainers=2.4.0=pyhd3eb1b0_0
  - soupsieve=2.3.1=pyhd3eb1b0_0
  - spacy=3.2.1=py37h796e4cb_1
  - spacy-legacy=3.0.8=pyhd3eb1b0_0
  - spacy-loggers=1.0.1=pyhd3eb1b0_0
  - sqlalchemy=1.4.27=py37h7f8727e_0
  - sqlite=3.37.0=hc218d9a_0
  - srsly=2.4.1=py37h2531618_0
  - sshtunnel=0.4.0=pyhd8ed1ab_1
  - starlette=0.16.0=pyhd8ed1ab_0
  - statsmodels=0.13.1=py37hb1e94ed_0
  - stopit=1.1.2=py_0
  - suds-community=0.8.5=pyhd8ed1ab_0
  - swagger-spec-validator=2.7.4=pyhd8ed1ab_0
  - sympy=1.9=py37h89c1867_1
  - sysroot_linux-64=2.17=h57e8cba_10
  - tabulate=0.8.9=py37h06a4308_0
  - tblib=1.7.0=pyhd3eb1b0_0
  - tenacity=8.0.1=py37h06a4308_0
  - terminado=0.9.4=py37h06a4308_0
  - testpath=0.5.0=pyhd3eb1b0_0
  - text-unidecode=1.3=pyhd3eb1b0_0
  - textwrap3=0.9.2=py_0
  - theano-pymc=1.1.2=py37h51133e4_0
  - theme-darcula=3.1.1=pyh3684270_0
  - theseus=3.3.0=h52bb08c_1
  - thinc=8.0.13=py37hae6d005_0
  - threadpoolctl=2.2.0=pyh0d69192_0
  - tifffile=2021.7.2=pyhd3eb1b0_2
  - tk=8.6.11=h1ccaba5_0
  - toml=0.10.2=pyhd3eb1b0_0
  - tomli=1.2.2=pyhd3eb1b0_0
  - toolz=0.11.2=pyhd3eb1b0_0
  - tornado=6.1=py37h27cfd23_0
  - tpot=0.11.7=pyhd8ed1ab_1
  - tqdm=4.62.3=pyhd8ed1ab_0
  - traitlets=5.1.1=pyhd3eb1b0_0
  - traittypes=0.2.1=pyh9f0ad1d_2
  - typed-ast=1.4.3=py37h7f8727e_1
  - typer=0.4.0=pyhd3eb1b0_0
  - typing-extensions=3.10.0.2=hd3eb1b0_0
  - typing_extensions=3.10.0.2=pyh06a4308_0
  - typing_inspect=0.7.1=pyhd3eb1b0_0
  - unidecode=1.2.0=pyhd3eb1b0_0
  - update_checker=0.18.0=pyh9f0ad1d_0
  - url-normalize=1.4.3=pyhd8ed1ab_0
  - urllib3=1.26.8=pyhd3eb1b0_0
  - vaex=4.7.0=pyhd8ed1ab_0
  - vaex-astro=0.9.0=pyhd8ed1ab_0
  - vaex-core=4.7.0.post1=py37h092ef5d_0
  - vaex-hdf5=0.11.1=pyhd8ed1ab_0
  - vaex-jupyter=0.7.0=pyhd8ed1ab_0
  - vaex-ml=0.16.0=pyhd8ed1ab_0
  - vaex-server=0.8.0=pyhd8ed1ab_0
  - vaex-viz=0.5.1=pyhd8ed1ab_0
  - wasabi=0.8.2=pyhd3eb1b0_0
  - wcwidth=0.2.5=pyhd3eb1b0_0
  - webencodings=0.5.1=py37_1
  - websocket-client=0.58.0=py37h06a4308_4
  - werkzeug=2.0.2=pyhd3eb1b0_0
  - wheel=0.37.1=pyhd3eb1b0_0
  - widgetsnbextension=3.5.1=py37_0
  - wrapt=1.13.3=py37h7f8727e_2
  - xarray=0.20.1=pyhd3eb1b0_1
  - xeus=2.3.1=hab3612f_0
  - xeus-python=0.13.6=py37h4b46df4_1
  - xeus-python-shell=0.2.0=pyhd8ed1ab_0
  - xyzservices=2022.1.1=pyhd8ed1ab_0
  - xz=5.2.5=h7b6447c_0
  - yaml=0.2.5=h7b6447c_0
  - yarl=1.5.1=py37h7b6447c_0
  - yellowbrick=1.3.post1=pyhd8ed1ab_1
  - zeromq=4.3.4=h2531618_0
  - zfp=0.5.5=h295c915_6
  - zict=2.0.0=pyhd3eb1b0_0
  - zipp=3.7.0=pyhd3eb1b0_0
  - zlib=1.2.11=h36c2ea0_1013
  - zstd=1.5.2=ha95c52a_0
  - pip:
    - absl-py==1.0.0
    - adal==1.2.7
    - alembic==1.7.6
    - aquirdturtle-collapsible-headings==3.1.0
    - autopage==0.5.0
    - autopep8==1.6.0
    - azure-core==1.22.0
    - azure-datalake-store==0.0.52
    - azure-storage-blob==12.9.0
    - bioregistry==0.4.58
    - bioversions==0.4.4
    - black-nb==0.7
    - cachier==1.5.4
    - chainer==7.8.1
    - chembl-downloader==0.2.1
    - chembl-webresource-client==0.10.7
    - click-default-group==1.2.2
    - cliff==3.10.0
    - cmaes==0.8.2
    - cmd2==2.3.3
    - colorlog==6.6.0
    - cupy-cuda114==10.1.0
    - dash==2.1.0
    - dash-bootstrap-components==1.0.2
    - dash-core-components==2.0.0
    - dash-cytoscape==0.3.0
    - dash-extensions==0.0.68
    - dash-html-components==2.0.0
    - dash-leaflet==0.1.23
    - dash-pivottable==0.0.2
    - dash-resumable-upload==0.0.3
    - dash-table==5.0.0
    - dash-tabulator==0.4.2
    - dash-trich-components==1.0.0
    - dash-ui==0.4.0
    - dask-labextension==5.2.0
    - datacache==1.1.5
    - dataclasses==0.6
    - dataclasses-json==0.5.6
    - easydict==1.9
    - editorconfig==0.12.3
    - faiss-gpu==1.7.2
    - fastrlock==0.8
    - flatbuffers==2.0
    - ftfy==6.0.3
    - gast==0.5.3
    - gcsfs==2022.1.0
    - geobuf==1.1.1
    - google-auth-oauthlib==0.4.6
    - google-pasta==0.2.0
    - gtfparse==1.2.1
    - html5lib==1.1
    - isodate==0.6.1
    - jsbeautifier==1.14.0
    - jupyter==1.0.0
    - jupyter-console==6.4.0
    - jupyter-tensorboard==0.2.0
    - jupyterlab-fasta==3.2.0
    - jupyterlab-geojson==3.2.0
    - jupyterlab-mathjax3==4.3.0
    - jupyterlab-plugin-playground==0.3.0
    - jupyterlab-spreadsheet-editor==0.6.1
    - jupyterlab-system-monitor==0.8.0
    - jupyterlab-templates==0.3.1
    - jupyterlab-theme-solarized-dark==2.0.1
    - jupyterlab-topbar==0.6.1
    - jupyterlab-vega3==3.2.0
    - jupyternotify==0.1.15
    - keras==2.8.0
    - keras-preprocessing==1.1.2
    - keras-tuner==master
    - kt-legacy==1.0.4
    - lckr-jupyterlab-variableinspector==3.0.9
    - libclang==13.0.0
    - logomaker==0.8
    - mako==1.1.6
    - marshmallow-enum==1.5.1
    - memoized-property==1.0.3
    - mock==4.0.3
    - more-click==0.0.6
    - more-itertools==8.12.0
    - mpire==2.3.3
    - msrest==0.6.21
    - nvidia-ml-py==11.515.0
    - oauthlib==3.2.0
    - opt-einsum==3.3.0
    - optuna==2.10.0
    - orjson==3.6.6
    - pathtools==0.1.2
    - pbr==5.8.0
    - pickle5==0.0.12
    - polyglot==16.7.4
    - portalocker==2.3.2
    - prettytable==3.0.0
    - progressbar33==2.4
    - psycopg2-binary==2.9.3
    - pycodestyle==2.8.0
    - pyensembl==1.9.4
    - pyjwt==2.3.0
    - pypdb==2.0
    - pyperclip==1.8.2
    - pystow==0.3.1
    - python-levenshtein==0.12.2
    - qtconsole==5.2.2
    - qtpy==2.0.1
    - requests-cache==0.7.5
    - requests-ftp==0.3.1
    - requests-oauthlib==1.3.1
    - scalene==1.5.3
    - scrapbook==0.5.0
    - sd-material-ui==4.6.0
    - serializable==0.2.1
    - stevedore==3.5.0
    - tensorboard==2.8.0
    - tensorboard-data-server==0.6.1
    - tensorboard-plugin-wit==1.8.1
    - tensorflow==2.8.0
    - tensorflow-io-gcs-filesystem==0.24.0
    - termcolor==1.1.0
    - textblob==0.17.1
    - tf-estimator-nightly==2.8.0.dev2021122109
    - thefuzz==0.19.0
    - tinytimer==0.0.0
    - tokenize-rt==4.2.1
    - typechecks==0.1.0
    - visdcc==0.0.40
    - watchdog==2.1.6
    - xgboost==1.5.2
quasiben commented 2 years ago

I haven't seen this kind of error before when CUDA is around. Is CUDA correctly being brought into the docker container (can you run nvidia-smi in the container)?

cc @charlesbluca in case you have thoughts here

charlesbluca commented 2 years ago

Nothing stands out to me here - looking at nvmlDeviceGetName, it returns the value of a ctypes.c_char_Array_*, which from my limited understanding of ctypes should always be bytes (cc @rjzamora in case this isn't always the case)

I'd be interested in seeing what the return value of this query is with a minimal reproducer:

from pynvml import *

nvmlInit()

h = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceGetName(h)

Also noticed that pynvml wasn't listed in your environment, are you able to import it and check the version?

ianozsvald commented 2 years ago

I got hit by this yesterday whilst preparing to teach one of my Higher Performance Python courses, it looked like the latest Dask was at fault until I realised I had a strange dependency issue via a second library (scalene) which installs nvidia-ml-py which contains pynvml.

Reproducible example:

# use conda to install a fresh environment, Python 3.9, current dask but _no_ scalene
$ ipython
In [1]: import dask.distributed # version 2022.03.0
In [2]: dask.distributed.Client() # runs with success
Out[2]: <Client: 'tcp://127.0.0.1:33537' processes=4 threads=8, memory=31.12 GiB>

# Now install `pynvml` to get the bug...

$ pip install scalene # sidenote can be replaced just with `pip install nvidia_ml_py` to generate same bug, see below
...
Successfully installed commonmark-0.9.1 nvidia-ml-py-11.515.0 rich-12.0.1 scalene-1.5.5

$ ipython
In [1]: import dask.distributed
In [2]: dask.distributed.Client()
...
File ~/miniconda3/envs/course3/lib/python3.9/site-packages/distributed/diagnostics/nvml.py:123, in _get_name(h)
    121 def _get_name(h):
    122     try:
--> 123         return pynvml.nvmlDeviceGetName(h).decode()
    124     except pynvml.NVMLError_NotSupported:
    125         return None

https://github.com/plasma-umass/scalene is a relatively new combined cpu+memory profiler (mac/linux only) and a more recent addition has been to profile GPUs as well as CPUs. https://github.com/plasma-umass/scalene/issues/378 notes that pynvml is not optional but could be made so (and I'd agree).

Whilst the bug is not directly with Dask, the fact that Dask uses pynvml (somehow, I've not dug) if it is installed and then fails feels brittle. I don't know how else one gets pynvml installed. I can confirm that if I don't install scalene and only $ pip install nvidia_ml_py then the above bug is easily reproduced.

I'm working on Linux (Mint 20.3) via conda with fresh Python 3.9 installation of standard data science tools (Dask, Pandas, numpy etc).

quadrupole commented 2 years ago

+1 I had the exact same problem as @ianozsvald

emeryberger commented 2 years ago

The Dask + Scalene issue has been fixed by replacing the nvidia-ml-py dependency with pynvml (https://github.com/plasma-umass/scalene/issues/378).

cjnolet commented 1 year ago

I'm currently running into this w/ the dask-cuda nightly. Environment was created with this:

mamba create --name new_env python=3.10
conda activate new_env
mamba install -c conda-forge -c nvidia -c rapidsai-nightly dask-cuda=23.04* cuml=23.04*
2023-02-17 17:18:32,713 - distributed.deploy.spec - WARNING - Cluster closed without starting up
Traceback (most recent call last):
  File "/home/cnolet/miniconda3/envs/cuml_2304_021623/lib/python3.10/site-packages/distributed/deploy/spec.py", line 319, in _start
    self.scheduler = cls(**self.scheduler_spec.get("options", {}))
  File "/home/cnolet/miniconda3/envs/cuml_2304_021623/lib/python3.10/site-packages/distributed/scheduler.py", line 3662, in __init__
    ServerNode.__init__(
  File "/home/cnolet/miniconda3/envs/cuml_2304_021623/lib/python3.10/site-packages/distributed/core.py", line 348, in __init__
    self.monitor = SystemMonitor()
  File "/home/cnolet/miniconda3/envs/cuml_2304_021623/lib/python3.10/site-packages/distributed/system_monitor.py", line 96, in __init__
    gpu_extra = nvml.one_time()
  File "/home/cnolet/miniconda3/envs/cuml_2304_021623/lib/python3.10/site-packages/distributed/diagnostics/nvml.py", line 336, in one_time
    "name": _get_name(h),
  File "/home/cnolet/miniconda3/envs/cuml_2304_021623/lib/python3.10/site-packages/distributed/diagnostics/nvml.py", line 319, in _get_name
    return pynvml.nvmlDeviceGetName(h).decode()
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?
# packages in environment at /home/cnolet/miniconda3/envs/cuml_2304_021623:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
arrow-cpp                 10.0.1           ha770c72_8_cpu    conda-forge
aws-c-auth                0.6.23               h7c1ec98_1    conda-forge
aws-c-cal                 0.5.20               ha1c5a7c_4    conda-forge
aws-c-common              0.8.9                h0b41bf4_0    conda-forge
aws-c-compression         0.2.16               h1afc718_1    conda-forge
aws-c-event-stream        0.2.18               h6620826_2    conda-forge
aws-c-http                0.7.3                h33879ea_1    conda-forge
aws-c-io                  0.13.14              hf82dcb6_3    conda-forge
aws-c-mqtt                0.8.6                hdd1a3fa_1    conda-forge
aws-c-s3                  0.2.3                h5f5417b_3    conda-forge
aws-c-sdkutils            0.1.7                h1afc718_1    conda-forge
aws-checksums             0.1.14               h1afc718_1    conda-forge
aws-crt-cpp               0.18.16             hf9eb7b6_13    conda-forge
aws-sdk-cpp               1.10.57              h063c87b_2    conda-forge
bokeh                     2.4.3              pyhd8ed1ab_3    conda-forge
brotlipy                  0.7.0           py310h5764c6d_1005    conda-forge
bzip2                     1.0.8                h7b6447c_0  
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2023.01.10           h06a4308_0  
cachetools                5.3.0              pyhd8ed1ab_0    conda-forge
certifi                   2022.12.7       py310h06a4308_0  
cffi                      1.15.1          py310h255011f_3    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
cloudpickle               2.2.1              pyhd8ed1ab_0    conda-forge
cryptography              39.0.1          py310h34c0648_0    conda-forge
cubinlinker               0.2.0           py310hf09951c_1    rapidsai-nightly
cuda-profiler-api         11.8.86                       0    nvidia
cuda-python               11.8.1          py310h01a121a_2    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
cudf                      23.04.00a       cuda_11_py310_230216_g4e32bfe3aa_99    rapidsai-nightly
cuml                      23.04.00a       cuda11_py310_230216_g3bab4d1f7_71    rapidsai-nightly
cupy                      11.5.0          py310h9216885_0    conda-forge
cytoolz                   0.12.0          py310h5764c6d_1    conda-forge
dask                      2023.2.0           pyhd8ed1ab_0    conda-forge
dask-core                 2023.2.0           pyhd8ed1ab_0    conda-forge
dask-cuda                 23.04.00a       py310_230215_g8134e6b_25    rapidsai-nightly
dask-cudf                 23.04.00a       cuda_11_py310_230216_g4e32bfe3aa_99    rapidsai-nightly
distributed               2023.2.0           pyhd8ed1ab_0    conda-forge
dlpack                    0.5                  h9c3ff4c_0    conda-forge
faiss-proc                1.0.0                      cuda    conda-forge
fastavro                  1.7.1           py310h1fa729e_0    conda-forge
fastrlock                 0.8             py310hd8f1fbe_3    conda-forge
fmt                       9.1.0                h924138e_0    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
fsspec                    2023.1.0           pyhd8ed1ab_0    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
glog                      0.6.0                h6f12383_0    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
jpeg                      9e                   h0b41bf4_3    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lcms2                     2.14                 hfd0df8a_1    conda-forge
ld_impl_linux-64          2.38                 h1181459_1  
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20220623.0      cxx17_h05df665_6    conda-forge
libarrow                  10.0.1           h2c3b227_8_cpu    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcublas                 11.11.3.6                     0    nvidia
libcublas-dev             11.11.3.6                     0    nvidia
libcudf                   23.04.00a       cuda11_230216_g4e32bfe3aa_99    rapidsai-nightly
libcufft                  10.9.0.58                     0    nvidia
libcuml                   23.04.00a       cuda11_230216_g3bab4d1f7_71    rapidsai-nightly
libcumlprims              23.04.00a       cuda11_230208_gc3bf2c8_4    rapidsai-nightly
libcurand                 10.3.0.86                     0    nvidia
libcurand-dev             10.3.0.86                     0    nvidia
libcurl                   7.88.0               hdc1c0ab_0    conda-forge
libcusolver               11.4.1.48                     0    nvidia
libcusolver-dev           11.4.1.48                     0    nvidia
libcusparse               11.7.5.86                     0    nvidia
libcusparse-dev           11.7.5.86                     0    nvidia
libdeflate                1.17                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h28343ad_4    conda-forge
libfaiss                  1.7.2           cuda112hb18a002_3_cuda    conda-forge
libffi                    3.4.2                h6a678d5_6  
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libgoogle-cloud           2.7.0                h21dfe5b_1    conda-forge
libgrpc                   1.51.1               h4fad500_1    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libllvm11                 11.1.0               he0ac6c6_5    conda-forge
libnghttp2                1.51.0               hff17c54_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libprotobuf               3.21.12              h3eb15da_0    conda-forge
libraft-distance          23.04.00a       cuda11_230216_ge14ec63a_64    rapidsai-nightly
libraft-headers           23.04.00a       cuda11_230216_ge14ec63a_64    rapidsai-nightly
libraft-nn                23.04.00a       cuda11_230216_ge14ec63a_64    rapidsai-nightly
librmm                    23.04.00a       cuda11_230216_g82e184fe_16    rapidsai-nightly
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libthrift                 0.16.0               he500d00_2    conda-forge
libtiff                   4.5.0                h6adf6a1_2    conda-forge
libutf8proc               2.8.0                h166bdaf_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libwebp-base              1.2.4                h166bdaf_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
llvmlite                  0.39.1          py310h58363a5_1    conda-forge
locket                    1.0.0              pyhd8ed1ab_0    conda-forge
lz4                       4.3.2           py310h0cfdcf0_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
markupsafe                2.1.2           py310h1fa729e_0    conda-forge
msgpack-python            1.0.4           py310hbf28c38_1    conda-forge
nccl                      2.14.3.1             h0800d71_0    conda-forge
ncurses                   6.4                  h6a678d5_0  
numba                     0.56.4          py310ha5257ce_0    conda-forge
numpy                     1.23.5          py310h53a5b5f_0    conda-forge
nvtx                      0.2.3           py310h5764c6d_2    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.0.8                h0b41bf4_0    conda-forge
orc                       1.8.2                hfdbbad2_2    conda-forge
packaging                 23.0               pyhd8ed1ab_0    conda-forge
pandas                    1.5.3           py310h9b08913_0    conda-forge
parquet-cpp               1.5.1                         2    conda-forge
partd                     1.3.0              pyhd8ed1ab_0    conda-forge
pillow                    9.4.0           py310h023d228_1    conda-forge
pip                       22.3.1          py310h06a4308_0  
pooch                     1.6.0              pyhd8ed1ab_0    conda-forge
protobuf                  4.21.12         py310heca2aa9_0    conda-forge
psutil                    5.9.4           py310h5764c6d_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptxcompiler               0.7.0           py310h01a121a_3    conda-forge
pyarrow                   10.0.1          py310h633f555_8_cpu    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pylibraft                 23.04.00a       cuda11_py310_230216_ge14ec63a_64    rapidsai-nightly
pynvml                    11.5.0             pyhd8ed1ab_0    conda-forge
pyopenssl                 23.0.0             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.10.9          he550d4f_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.10                    3_cp310    conda-forge
pytz                      2022.7.1           pyhd8ed1ab_0    conda-forge
pyyaml                    6.0             py310h5764c6d_5    conda-forge
raft-dask                 23.04.00a       cuda11_py310_230216_ge14ec63a_64    rapidsai-nightly
re2                       2023.02.01           hcb278e6_0    conda-forge
readline                  8.2                  h5eee18b_0  
requests                  2.28.2             pyhd8ed1ab_0    conda-forge
rmm                       23.04.00a       cuda11_py310_230216_g82e184fe_16    rapidsai-nightly
s2n                       1.3.35               h3358134_0    conda-forge
scipy                     1.10.0          py310h8deb116_2    conda-forge
setuptools                65.6.3          py310h06a4308_0  
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.9                hbd366e4_2    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
spdlog                    1.11.0               h9b3ece8_1    conda-forge
sqlite                    3.40.1               h5082296_0  
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h1ccaba5_0  
toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
tornado                   6.2             py310h5764c6d_1    conda-forge
treelite                  3.1.0           py310h168469b_0    conda-forge
treelite-runtime          3.1.0                    pypi_0    pypi
typing_extensions         4.4.0              pyha770c72_0    conda-forge
tzdata                    2022g                h04d1e81_0  
ucx                       1.13.1               h538f049_1    conda-forge
ucx-proc                  1.0.0                       gpu    conda-forge
ucx-py                    0.31.00a230203  py310_g3806c64_4    rapidsai-nightly
urllib3                   1.26.14            pyhd8ed1ab_0    conda-forge
wheel                     0.37.1             pyhd3eb1b0_0  
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.10               h5eee18b_1  
yaml                      0.2.5                h7f98852_2    conda-forge
zict                      2.2.0              pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge

I get this error when trying to start the cluster with LocalCUDACluster

cjnolet commented 1 year ago

Following up my previous reply, it looks like downgrading to pynvml 11.4.1 works. I'll do that for now.

wence- commented 1 year ago

This was fixed in https://github.com/dask/distributed/pull/7544 but I think for that you need a distributed nightly as well.

So mamba install -c conda-forge -c nvidia -c rapidsai-nightly -c dask/label/dev dask-cuda=23.04*. I am not sufficiently au fait with conda to know how to specify this directly as a dep in dask-cuda.

jrbourbeau commented 1 year ago

Thanks for following up here @cjnolet @wence-. Going to close this issue via https://github.com/dask/distributed/pull/7544 -- let me know if it should be re-opened