Closed MaximilianHoffmann closed 3 years ago
This looks like you are running and older version of dask/distributed. Can you try upgrading to latest 2020.12.0 ?
I'm curious, have you set the OMP_NUM_THREADS=1 environment variable? You may be oversubscribing threads. I think that OpenMP seg faults if it finds that you're using more than twice as many threads as cores.
On Fri, Dec 18, 2020 at 3:13 PM Maximilian Hoffmann < notifications@github.com> wrote:
What happened: The code generates a seg-fault, when I run it like this, but works if I switch the scheduler to a single thread. I isolated the tensordot function after I tried to run some code from scikit image.
I am running this code on an HPC cluster with slurm scheduler, but the dask cluster is a local on on a single machine.
It's possibly related to dask/dask-ml#629 https://github.com/dask/dask-ml/issues/629
Minimal Complete Verifiable Example:
import numpy as np import os
import dask.array as da import dask
from dask.distributed import Client#, LocalCluster
cluster = LocalCluster(,)
dask.config.set({'distributed.worker.memory.target': False, 'distributed.worker.memory.spill': False}) client = Client(n_workers=1 , memory_limit =8e9,threads_per_worker=8,processes=False)
dftu = lambda arr: np.tensordot(arr,arr)[:,:,None] arr=da.from_array(np.random.randn(128,128,128),chunks=(64,64,64)) arr2=da.map_blocks(dftu, arr, chunks=[128,128,1],dtype=np.float) arr2.compute()
Anything else we need to know?:
Environment:
Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_llvm conda-forge alsa-lib 1.2.3 h516909a_0 conda-forge antspyx 0.2.5 pypi_0 pypi arrow 0.13.1 py37_0 attrs 19.3.0 py_0 conda-forge av 8.0.0 py37h82f89c2_0 conda-forge backcall 0.1.0 py_0 conda-forge bleach 3.1.4 pyh9f0ad1d_0 conda-forge bokeh 2.0.2 py37_0 bzip2 1.0.8 h516909a_2 conda-forge ca-certificates 2020.12.5 ha878542_0 conda-forge certifi 2020.12.5 py37h89c1867_0 conda-forge cffi 1.14.0 py37hd463f26_0 conda-forge chardet 3.0.4 pypi_0 pypi chart-studio 1.1.0 pypi_0 pypi click 7.1.1 py_0 cloudpickle 1.4.1 py_0 cycler 0.10.0 py_2 conda-forge cytoolz 0.10.1 py37h7b6447c_0 dask 2.17.2 py_0 conda-forge dask-core 2.17.2 py_0 conda-forge dask-image 0.4.0 pyh9f0ad1d_0 conda-forge dask-jobqueue 0.7.0 py_0 decorator 4.4.2 py_0 conda-forge defusedxml 0.6.0 py_0 conda-forge distributed 2.17.0 py37hc8dfbb8_0 conda-forge entrypoints 0.3 py37hc8dfbb8_1001 conda-forge ffmpeg 4.2 h167e202_0 conda-forge freetype 2.10.1 he06d7ca_0 conda-forge fsspec 0.7.1 py_0 future 0.18.2 py37_0 gettext 0.19.8.1 hc5be6a0_1002 conda-forge gmp 6.2.0 he1b5a44_2 conda-forge gnutls 3.6.5 hd3a4fd2_1002 conda-forge h5py 2.10.0 nompi_py37h513d04c_102 conda-forge hdf5 1.10.5 nompi_h3c11f04_1104 conda-forge heapdict 1.0.1 py_0 icu 64.2 he1b5a44_1 conda-forge idna 2.9 pypi_0 pypi imagecodecs 2020.2.18 pypi_0 pypi imagecodecs-lite 2019.12.3 py37h8f50634_0 conda-forge imageio 2.8.0 py_0 conda-forge importlib-metadata 1.6.0 py37hc8dfbb8_0 conda-forge importlib_metadata 1.6.0 0 conda-forge interpolation 2.1.6 py_0 conda-forge ipykernel 5.2.1 py37h43977f1_0 conda-forge ipympl 0.5.6 pyh9f0ad1d_1 conda-forge ipython 7.13.0 py37hc8dfbb8_2 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.5.1 py_0 conda-forge jedi 0.17.0 py37hc8dfbb8_0 conda-forge jinja2 2.11.2 pyh9f0ad1d_0 conda-forge joblib 0.14.1 py_0 conda-forge jpeg 9c h14c3975_1001 conda-forge jsonschema 3.2.0 py37hc8dfbb8_1 conda-forge jupyter_client 6.1.3 py_0 conda-forge jupyter_core 4.6.3 py37hc8dfbb8_1 conda-forge kiwisolver 1.2.0 py37h99015e2_0 conda-forge lame 3.100 h14c3975_1001 conda-forge ld_impl_linux-64 2.34 h53a641e_0 conda-forge libblas 3.8.0 16_openblas conda-forge libcblas 3.8.0 16_openblas conda-forge libffi 3.2.1 he1b5a44_1007 conda-forge libflac 1.3.3 he1b5a44_0 conda-forge libgcc-ng 9.3.0 h5dbcf3e_17 conda-forge libgfortran-ng 7.3.0 hdf63c60_5 conda-forge libiconv 1.15 h516909a_1006 conda-forge liblapack 3.8.0 16_openblas conda-forge libllvm8 8.0.1 hc9558a2_0 conda-forge libogg 1.3.2 h516909a_1002 conda-forge libopenblas 0.3.9 h5ec1e0e_0 conda-forge libpng 1.6.37 hed695b0_1 conda-forge libsndfile 1.0.28 he1b5a44_1000 conda-forge libsodium 1.0.17 h516909a_0 conda-forge libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge libtiff 4.1.0 hc7e4089_6 conda-forge libvorbis 1.3.6 he1b5a44_2 conda-forge libwebp-base 1.1.0 h516909a_3 conda-forge llvm-openmp 10.0.0 hc9558a2_0 conda-forge llvmlite 0.31.0 py37h5202443_1 conda-forge locket 0.2.0 py37_1 lz4-c 1.9.2 he1b5a44_0 conda-forge markupsafe 1.1.1 py37h8f50634_1 conda-forge matplotlib-base 3.2.1 py37h30547a4_0 conda-forge mistune 0.8.4 py37h8f50634_1001 conda-forge msgpack-python 1.0.0 py37hfd86e86_1 nbconvert 5.6.1 py37hc8dfbb8_1 conda-forge nbformat 5.0.6 py_0 conda-forge ncurses 6.1 hf484d3e_1002 conda-forge nd2reader 3.2.3 py_2 conda-forge nettle 3.4.1 h1bed415_1002 conda-forge networkx 2.4 py_1 conda-forge nibabel 3.1.0 pypi_0 pypi notebook 6.0.3 py37_0 conda-forge numba 0.49.1 py37h0da4684_0 conda-forge numexpr 2.7.1 pypi_0 pypi numpy 1.17.5 py37h95a1406_0 conda-forge numpy-stl 2.11.2 pypi_0 pypi olefile 0.46 py_0 conda-forge opencv-python 4.2.0.34 pypi_0 pypi openh264 1.8.0 hdbcaa40_1000 conda-forge openssl 1.1.1i h7f98852_0 conda-forge packaging 20.3 py_0 pandas 1.0.3 py37h0da4684_1 conda-forge pandoc 2.9.2.1 0 conda-forge pandocfilters 1.4.2 py_1 conda-forge parso 0.7.0 pyh9f0ad1d_0 conda-forge partd 1.1.0 py_0 patsy 0.5.1 pypi_0 pypi pexpect 4.8.0 py37hc8dfbb8_1 conda-forge pickleshare 0.7.5 py37hc8dfbb8_1001 conda-forge pillow 7.1.2 py37h718be6c_0 conda-forge pims 0.4.1 py_1 conda-forge pip 20.0.2 py_2 conda-forge plotly 4.6.0 pypi_0 pypi portaudio 19.6.0 h1398938_3 conda-forge prometheus_client 0.7.1 py_0 conda-forge prompt-toolkit 3.0.5 py_0 conda-forge psutil 5.7.0 py37h7b6447c_0 ptyprocess 0.6.0 py_1001 conda-forge pycparser 2.20 py_0 conda-forge pygments 2.6.1 py_0 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pyrsistent 0.16.0 py37h8f50634_0 conda-forge pysoundfile 0.10.2 py_1001 conda-forge python 3.7.6 h8356626_5_cpython conda-forge python-dateutil 2.8.1 py_0 conda-forge python-sounddevice 0.4.0 pyh9f0ad1d_0 conda-forge python-utils 2.4.0 pypi_0 pypi python_abi 3.7 1_cp37m conda-forge pytz 2019.3 py_0 conda-forge pywavelets 1.1.1 py37h03ebfcd_1 conda-forge pyyaml 5.3.1 pypi_0 pypi pyzmq 19.0.0 py37hac76be4_1 conda-forge readline 8.0 hf8c457e_0 conda-forge requests 2.23.0 pypi_0 pypi retrying 1.3.3 pypi_0 pypi scanimage-tiff-reader 1.4.1 pypi_0 pypi scikit-image 0.18.0 pypi_0 pypi scikit-learn 0.22.2.post1 py37hcdab131_0 conda-forge scipy 1.4.1 py37ha3d9a3c_3 conda-forge send2trash 1.5.0 py_0 conda-forge setuptools 46.1.3 py37hc8dfbb8_0 conda-forge six 1.14.0 py_1 conda-forge sklearn 0.0 pypi_0 pypi slicerator 1.0.0 py_0 conda-forge slurm-magic 0.0.5 pypi_0 pypi sortedcontainers 2.1.0 py37_0 sparse 0.10.0 py_0 conda-forge sqlite 3.30.1 hcee41ef_0 conda-forge statsmodels 0.11.1 pypi_0 pypi tables 3.6.1 pypi_0 pypi tblib 1.6.0 py_0 tempita 0.5.3dev py37_1001 conda-forge terminado 0.8.3 py37hc8dfbb8_1 conda-forge testpath 0.4.4 py_0 conda-forge tifffile 2020.2.16 pypi_0 pypi tk 8.6.10 hed695b0_0 conda-forge toolz 0.10.0 py_0 tornado 6.0.4 py37h8f50634_1 conda-forge tqdm 4.45.0 pyh9f0ad1d_0 conda-forge traitlets 4.3.3 py37hc8dfbb8_1 conda-forge typing_extensions 3.7.4.1 py37_0 urllib3 1.25.9 pypi_0 pypi wcwidth 0.1.9 pyh9f0ad1d_0 conda-forge webcolors 1.11.1 pypi_0 pypi webencodings 0.5.1 py_1 conda-forge wheel 0.34.2 py_1 conda-forge widgetsnbextension 3.5.1 py37_0 conda-forge x264 1!152.20180806 h14c3975_0 conda-forge xmltodict 0.12.0 py_0 conda-forge xz 5.2.5 h516909a_0 conda-forge yaml 0.1.7 had09818_2 zeromq 4.3.2 he1b5a44_2 conda-forge zict 2.0.0 py_0 zipp 3.1.0 py_0 conda-forge zlib 1.2.11 h516909a_1006 conda-forge zstd 1.4.4 h6597ccf_3 conda-forge
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/4381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTHM7PD7QTQ3M65EJJTSVPO23ANCNFSM4VBZOHSA .
Thank you both for your suggestions @quasiben after the update to 2020.12.0 the problem disappeared. @mrocklin thank your for your suggestion, I didn't manually set OMP_NUM_THREADS anywhere.
What happened: The code generates a seg-fault, when I run it like this, but works if I switch the scheduler to a single thread. I isolated the tensordot function after I tried to run some code from scikit image.
I am running this code on an HPC cluster with slurm scheduler, but the dask cluster is a local one on a single node.
It's possibly related to https://github.com/dask/dask-ml/issues/629
Minimal Complete Verifiable Example:
Anything else we need to know?:
Environment: