dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 718 forks source link

Seg Fault of np.tensordot on Slurm HPC #4381

Closed MaximilianHoffmann closed 3 years ago

MaximilianHoffmann commented 3 years ago

What happened: The code generates a seg-fault, when I run it like this, but works if I switch the scheduler to a single thread. I isolated the tensordot function after I tried to run some code from scikit image.

I am running this code on an HPC cluster with slurm scheduler, but the dask cluster is a local one on a single node.

It's possibly related to https://github.com/dask/dask-ml/issues/629

Minimal Complete Verifiable Example:

import numpy as np
import os

import dask.array as da
import dask

from dask.distributed import Client
dask.config.set({'distributed.worker.memory.target': False, 'distributed.worker.memory.spill': False})
client = Client(n_workers=1 , memory_limit =8e9,threads_per_worker=8,processes=False)

dftu = lambda arr: np.tensordot(arr,arr)[:,:,None]
arr=da.from_array(np.random.randn(128,128,128),chunks=(64,64,64))
arr2=da.map_blocks(dftu, arr, chunks=[128,128,1],dtype=np.float)
arr2.compute()

Anything else we need to know?:

Environment:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                      1_llvm    conda-forge
alsa-lib                  1.2.3                h516909a_0    conda-forge
antspyx                   0.2.5                    pypi_0    pypi
arrow                     0.13.1                   py37_0  
attrs                     19.3.0                     py_0    conda-forge
av                        8.0.0            py37h82f89c2_0    conda-forge
backcall                  0.1.0                      py_0    conda-forge
bleach                    3.1.4              pyh9f0ad1d_0    conda-forge
bokeh                     2.0.2                    py37_0  
bzip2                     1.0.8                h516909a_2    conda-forge
ca-certificates           2020.12.5            ha878542_0    conda-forge
certifi                   2020.12.5        py37h89c1867_0    conda-forge
cffi                      1.14.0           py37hd463f26_0    conda-forge
chardet                   3.0.4                    pypi_0    pypi
chart-studio              1.1.0                    pypi_0    pypi
click                     7.1.1                      py_0  
cloudpickle               1.4.1                      py_0  
cycler                    0.10.0                     py_2    conda-forge
cytoolz                   0.10.1           py37h7b6447c_0  
dask                      2.17.2                     py_0    conda-forge
dask-core                 2.17.2                     py_0    conda-forge
dask-image                0.4.0              pyh9f0ad1d_0    conda-forge
dask-jobqueue             0.7.0                      py_0  
decorator                 4.4.2                      py_0    conda-forge
defusedxml                0.6.0                      py_0    conda-forge
distributed               2.17.0           py37hc8dfbb8_0    conda-forge
entrypoints               0.3             py37hc8dfbb8_1001    conda-forge
ffmpeg                    4.2                  h167e202_0    conda-forge
freetype                  2.10.1               he06d7ca_0    conda-forge
fsspec                    0.7.1                      py_0  
future                    0.18.2                   py37_0  
gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
gmp                       6.2.0                he1b5a44_2    conda-forge
gnutls                    3.6.5             hd3a4fd2_1002    conda-forge
h5py                      2.10.0          nompi_py37h513d04c_102    conda-forge
hdf5                      1.10.5          nompi_h3c11f04_1104    conda-forge
heapdict                  1.0.1                      py_0  
icu                       64.2                 he1b5a44_1    conda-forge
idna                      2.9                      pypi_0    pypi
imagecodecs               2020.2.18                pypi_0    pypi
imagecodecs-lite          2019.12.3        py37h8f50634_0    conda-forge
imageio                   2.8.0                      py_0    conda-forge
importlib-metadata        1.6.0            py37hc8dfbb8_0    conda-forge
importlib_metadata        1.6.0                         0    conda-forge
interpolation             2.1.6                      py_0    conda-forge
ipykernel                 5.2.1            py37h43977f1_0    conda-forge
ipympl                    0.5.6              pyh9f0ad1d_1    conda-forge
ipython                   7.13.0           py37hc8dfbb8_2    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.5.1                      py_0    conda-forge
jedi                      0.17.0           py37hc8dfbb8_0    conda-forge
jinja2                    2.11.2             pyh9f0ad1d_0    conda-forge
joblib                    0.14.1                     py_0    conda-forge
jpeg                      9c                h14c3975_1001    conda-forge
jsonschema                3.2.0            py37hc8dfbb8_1    conda-forge
jupyter_client            6.1.3                      py_0    conda-forge
jupyter_core              4.6.3            py37hc8dfbb8_1    conda-forge
kiwisolver                1.2.0            py37h99015e2_0    conda-forge
lame                      3.100             h14c3975_1001    conda-forge
ld_impl_linux-64          2.34                 h53a641e_0    conda-forge
libblas                   3.8.0               16_openblas    conda-forge
libcblas                  3.8.0               16_openblas    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libflac                   1.3.3                he1b5a44_0    conda-forge
libgcc-ng                 9.3.0               h5dbcf3e_17    conda-forge
libgfortran-ng            7.3.0                hdf63c60_5    conda-forge
libiconv                  1.15              h516909a_1006    conda-forge
liblapack                 3.8.0               16_openblas    conda-forge
libllvm8                  8.0.1                hc9558a2_0    conda-forge
libogg                    1.3.2             h516909a_1002    conda-forge
libopenblas               0.3.9                h5ec1e0e_0    conda-forge
libpng                    1.6.37               hed695b0_1    conda-forge
libsndfile                1.0.28            he1b5a44_1000    conda-forge
libsodium                 1.0.17               h516909a_0    conda-forge
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
libtiff                   4.1.0                hc7e4089_6    conda-forge
libvorbis                 1.3.6                he1b5a44_2    conda-forge
libwebp-base              1.1.0                h516909a_3    conda-forge
llvm-openmp               10.0.0               hc9558a2_0    conda-forge
llvmlite                  0.31.0           py37h5202443_1    conda-forge
locket                    0.2.0                    py37_1  
lz4-c                     1.9.2                he1b5a44_0    conda-forge
markupsafe                1.1.1            py37h8f50634_1    conda-forge
matplotlib-base           3.2.1            py37h30547a4_0    conda-forge
mistune                   0.8.4           py37h8f50634_1001    conda-forge
msgpack-python            1.0.0            py37hfd86e86_1  
nbconvert                 5.6.1            py37hc8dfbb8_1    conda-forge
nbformat                  5.0.6                      py_0    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
nd2reader                 3.2.3                      py_2    conda-forge
nettle                    3.4.1             h1bed415_1002    conda-forge
networkx                  2.4                        py_1    conda-forge
nibabel                   3.1.0                    pypi_0    pypi
notebook                  6.0.3                    py37_0    conda-forge
numba                     0.49.1           py37h0da4684_0    conda-forge
numexpr                   2.7.1                    pypi_0    pypi
numpy                     1.17.5           py37h95a1406_0    conda-forge
numpy-stl                 2.11.2                   pypi_0    pypi
olefile                   0.46                       py_0    conda-forge
opencv-python             4.2.0.34                 pypi_0    pypi
openh264                  1.8.0             hdbcaa40_1000    conda-forge
openssl                   1.1.1i               h7f98852_0    conda-forge
packaging                 20.3                       py_0  
pandas                    1.0.3            py37h0da4684_1    conda-forge
pandoc                    2.9.2.1                       0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
parso                     0.7.0              pyh9f0ad1d_0    conda-forge
partd                     1.1.0                      py_0  
patsy                     0.5.1                    pypi_0    pypi
pexpect                   4.8.0            py37hc8dfbb8_1    conda-forge
pickleshare               0.7.5           py37hc8dfbb8_1001    conda-forge
pillow                    7.1.2            py37h718be6c_0    conda-forge
pims                      0.4.1                      py_1    conda-forge
pip                       20.0.2                     py_2    conda-forge
plotly                    4.6.0                    pypi_0    pypi
portaudio                 19.6.0               h1398938_3    conda-forge
prometheus_client         0.7.1                      py_0    conda-forge
prompt-toolkit            3.0.5                      py_0    conda-forge
psutil                    5.7.0            py37h7b6447c_0  
ptyprocess                0.6.0                   py_1001    conda-forge
pycparser                 2.20                       py_0    conda-forge
pygments                  2.6.1                      py_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyrsistent                0.16.0           py37h8f50634_0    conda-forge
pysoundfile               0.10.2                  py_1001    conda-forge
python                    3.7.6           h8356626_5_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python-sounddevice        0.4.0              pyh9f0ad1d_0    conda-forge
python-utils              2.4.0                    pypi_0    pypi
python_abi                3.7                     1_cp37m    conda-forge
pytz                      2019.3                     py_0    conda-forge
pywavelets                1.1.1            py37h03ebfcd_1    conda-forge
pyyaml                    5.3.1                    pypi_0    pypi
pyzmq                     19.0.0           py37hac76be4_1    conda-forge
readline                  8.0                  hf8c457e_0    conda-forge
requests                  2.23.0                   pypi_0    pypi
retrying                  1.3.3                    pypi_0    pypi
scanimage-tiff-reader     1.4.1                    pypi_0    pypi
scikit-image              0.18.0                   pypi_0    pypi
scikit-learn              0.22.2.post1     py37hcdab131_0    conda-forge
scipy                     1.4.1            py37ha3d9a3c_3    conda-forge
send2trash                1.5.0                      py_0    conda-forge
setuptools                46.1.3           py37hc8dfbb8_0    conda-forge
six                       1.14.0                     py_1    conda-forge
sklearn                   0.0                      pypi_0    pypi
slicerator                1.0.0                      py_0    conda-forge
slurm-magic               0.0.5                    pypi_0    pypi
sortedcontainers          2.1.0                    py37_0  
sparse                    0.10.0                     py_0    conda-forge
sqlite                    3.30.1               hcee41ef_0    conda-forge
statsmodels               0.11.1                   pypi_0    pypi
tables                    3.6.1                    pypi_0    pypi
tblib                     1.6.0                      py_0  
tempita                   0.5.3dev              py37_1001    conda-forge
terminado                 0.8.3            py37hc8dfbb8_1    conda-forge
testpath                  0.4.4                      py_0    conda-forge
tifffile                  2020.2.16                pypi_0    pypi
tk                        8.6.10               hed695b0_0    conda-forge
toolz                     0.10.0                     py_0  
tornado                   6.0.4            py37h8f50634_1    conda-forge
tqdm                      4.45.0             pyh9f0ad1d_0    conda-forge
traitlets                 4.3.3            py37hc8dfbb8_1    conda-forge
typing_extensions         3.7.4.1                  py37_0  
urllib3                   1.25.9                   pypi_0    pypi
wcwidth                   0.1.9              pyh9f0ad1d_0    conda-forge
webcolors                 1.11.1                   pypi_0    pypi
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.34.2                     py_1    conda-forge
widgetsnbextension        3.5.1                    py37_0    conda-forge
x264                      1!152.20180806       h14c3975_0    conda-forge
xmltodict                 0.12.0                     py_0    conda-forge
xz                        5.2.5                h516909a_0    conda-forge
yaml                      0.1.7                had09818_2  
zeromq                    4.3.2                he1b5a44_2    conda-forge
zict                      2.0.0                      py_0  
zipp                      3.1.0                      py_0    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge
zstd                      1.4.4                h6597ccf_3    conda-forge
quasiben commented 3 years ago

This looks like you are running and older version of dask/distributed. Can you try upgrading to latest 2020.12.0 ?

mrocklin commented 3 years ago

I'm curious, have you set the OMP_NUM_THREADS=1 environment variable? You may be oversubscribing threads. I think that OpenMP seg faults if it finds that you're using more than twice as many threads as cores.

On Fri, Dec 18, 2020 at 3:13 PM Maximilian Hoffmann < notifications@github.com> wrote:

What happened: The code generates a seg-fault, when I run it like this, but works if I switch the scheduler to a single thread. I isolated the tensordot function after I tried to run some code from scikit image.

I am running this code on an HPC cluster with slurm scheduler, but the dask cluster is a local on on a single machine.

It's possibly related to dask/dask-ml#629 https://github.com/dask/dask-ml/issues/629

Minimal Complete Verifiable Example:

import numpy as np import os

import dask.array as da import dask

from dask.distributed import Client#, LocalCluster

cluster = LocalCluster(,)

dask.config.set({'distributed.worker.memory.target': False, 'distributed.worker.memory.spill': False}) client = Client(n_workers=1 , memory_limit =8e9,threads_per_worker=8,processes=False)

dftu = lambda arr: np.tensordot(arr,arr)[:,:,None] arr=da.from_array(np.random.randn(128,128,128),chunks=(64,64,64)) arr2=da.map_blocks(dftu, arr, chunks=[128,128,1],dtype=np.float) arr2.compute()

Anything else we need to know?:

Environment:

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_llvm conda-forge alsa-lib 1.2.3 h516909a_0 conda-forge antspyx 0.2.5 pypi_0 pypi arrow 0.13.1 py37_0 attrs 19.3.0 py_0 conda-forge av 8.0.0 py37h82f89c2_0 conda-forge backcall 0.1.0 py_0 conda-forge bleach 3.1.4 pyh9f0ad1d_0 conda-forge bokeh 2.0.2 py37_0 bzip2 1.0.8 h516909a_2 conda-forge ca-certificates 2020.12.5 ha878542_0 conda-forge certifi 2020.12.5 py37h89c1867_0 conda-forge cffi 1.14.0 py37hd463f26_0 conda-forge chardet 3.0.4 pypi_0 pypi chart-studio 1.1.0 pypi_0 pypi click 7.1.1 py_0 cloudpickle 1.4.1 py_0 cycler 0.10.0 py_2 conda-forge cytoolz 0.10.1 py37h7b6447c_0 dask 2.17.2 py_0 conda-forge dask-core 2.17.2 py_0 conda-forge dask-image 0.4.0 pyh9f0ad1d_0 conda-forge dask-jobqueue 0.7.0 py_0 decorator 4.4.2 py_0 conda-forge defusedxml 0.6.0 py_0 conda-forge distributed 2.17.0 py37hc8dfbb8_0 conda-forge entrypoints 0.3 py37hc8dfbb8_1001 conda-forge ffmpeg 4.2 h167e202_0 conda-forge freetype 2.10.1 he06d7ca_0 conda-forge fsspec 0.7.1 py_0 future 0.18.2 py37_0 gettext 0.19.8.1 hc5be6a0_1002 conda-forge gmp 6.2.0 he1b5a44_2 conda-forge gnutls 3.6.5 hd3a4fd2_1002 conda-forge h5py 2.10.0 nompi_py37h513d04c_102 conda-forge hdf5 1.10.5 nompi_h3c11f04_1104 conda-forge heapdict 1.0.1 py_0 icu 64.2 he1b5a44_1 conda-forge idna 2.9 pypi_0 pypi imagecodecs 2020.2.18 pypi_0 pypi imagecodecs-lite 2019.12.3 py37h8f50634_0 conda-forge imageio 2.8.0 py_0 conda-forge importlib-metadata 1.6.0 py37hc8dfbb8_0 conda-forge importlib_metadata 1.6.0 0 conda-forge interpolation 2.1.6 py_0 conda-forge ipykernel 5.2.1 py37h43977f1_0 conda-forge ipympl 0.5.6 pyh9f0ad1d_1 conda-forge ipython 7.13.0 py37hc8dfbb8_2 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.5.1 py_0 conda-forge jedi 0.17.0 py37hc8dfbb8_0 conda-forge jinja2 2.11.2 pyh9f0ad1d_0 conda-forge joblib 0.14.1 py_0 conda-forge jpeg 9c h14c3975_1001 conda-forge jsonschema 3.2.0 py37hc8dfbb8_1 conda-forge jupyter_client 6.1.3 py_0 conda-forge jupyter_core 4.6.3 py37hc8dfbb8_1 conda-forge kiwisolver 1.2.0 py37h99015e2_0 conda-forge lame 3.100 h14c3975_1001 conda-forge ld_impl_linux-64 2.34 h53a641e_0 conda-forge libblas 3.8.0 16_openblas conda-forge libcblas 3.8.0 16_openblas conda-forge libffi 3.2.1 he1b5a44_1007 conda-forge libflac 1.3.3 he1b5a44_0 conda-forge libgcc-ng 9.3.0 h5dbcf3e_17 conda-forge libgfortran-ng 7.3.0 hdf63c60_5 conda-forge libiconv 1.15 h516909a_1006 conda-forge liblapack 3.8.0 16_openblas conda-forge libllvm8 8.0.1 hc9558a2_0 conda-forge libogg 1.3.2 h516909a_1002 conda-forge libopenblas 0.3.9 h5ec1e0e_0 conda-forge libpng 1.6.37 hed695b0_1 conda-forge libsndfile 1.0.28 he1b5a44_1000 conda-forge libsodium 1.0.17 h516909a_0 conda-forge libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge libtiff 4.1.0 hc7e4089_6 conda-forge libvorbis 1.3.6 he1b5a44_2 conda-forge libwebp-base 1.1.0 h516909a_3 conda-forge llvm-openmp 10.0.0 hc9558a2_0 conda-forge llvmlite 0.31.0 py37h5202443_1 conda-forge locket 0.2.0 py37_1 lz4-c 1.9.2 he1b5a44_0 conda-forge markupsafe 1.1.1 py37h8f50634_1 conda-forge matplotlib-base 3.2.1 py37h30547a4_0 conda-forge mistune 0.8.4 py37h8f50634_1001 conda-forge msgpack-python 1.0.0 py37hfd86e86_1 nbconvert 5.6.1 py37hc8dfbb8_1 conda-forge nbformat 5.0.6 py_0 conda-forge ncurses 6.1 hf484d3e_1002 conda-forge nd2reader 3.2.3 py_2 conda-forge nettle 3.4.1 h1bed415_1002 conda-forge networkx 2.4 py_1 conda-forge nibabel 3.1.0 pypi_0 pypi notebook 6.0.3 py37_0 conda-forge numba 0.49.1 py37h0da4684_0 conda-forge numexpr 2.7.1 pypi_0 pypi numpy 1.17.5 py37h95a1406_0 conda-forge numpy-stl 2.11.2 pypi_0 pypi olefile 0.46 py_0 conda-forge opencv-python 4.2.0.34 pypi_0 pypi openh264 1.8.0 hdbcaa40_1000 conda-forge openssl 1.1.1i h7f98852_0 conda-forge packaging 20.3 py_0 pandas 1.0.3 py37h0da4684_1 conda-forge pandoc 2.9.2.1 0 conda-forge pandocfilters 1.4.2 py_1 conda-forge parso 0.7.0 pyh9f0ad1d_0 conda-forge partd 1.1.0 py_0 patsy 0.5.1 pypi_0 pypi pexpect 4.8.0 py37hc8dfbb8_1 conda-forge pickleshare 0.7.5 py37hc8dfbb8_1001 conda-forge pillow 7.1.2 py37h718be6c_0 conda-forge pims 0.4.1 py_1 conda-forge pip 20.0.2 py_2 conda-forge plotly 4.6.0 pypi_0 pypi portaudio 19.6.0 h1398938_3 conda-forge prometheus_client 0.7.1 py_0 conda-forge prompt-toolkit 3.0.5 py_0 conda-forge psutil 5.7.0 py37h7b6447c_0 ptyprocess 0.6.0 py_1001 conda-forge pycparser 2.20 py_0 conda-forge pygments 2.6.1 py_0 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pyrsistent 0.16.0 py37h8f50634_0 conda-forge pysoundfile 0.10.2 py_1001 conda-forge python 3.7.6 h8356626_5_cpython conda-forge python-dateutil 2.8.1 py_0 conda-forge python-sounddevice 0.4.0 pyh9f0ad1d_0 conda-forge python-utils 2.4.0 pypi_0 pypi python_abi 3.7 1_cp37m conda-forge pytz 2019.3 py_0 conda-forge pywavelets 1.1.1 py37h03ebfcd_1 conda-forge pyyaml 5.3.1 pypi_0 pypi pyzmq 19.0.0 py37hac76be4_1 conda-forge readline 8.0 hf8c457e_0 conda-forge requests 2.23.0 pypi_0 pypi retrying 1.3.3 pypi_0 pypi scanimage-tiff-reader 1.4.1 pypi_0 pypi scikit-image 0.18.0 pypi_0 pypi scikit-learn 0.22.2.post1 py37hcdab131_0 conda-forge scipy 1.4.1 py37ha3d9a3c_3 conda-forge send2trash 1.5.0 py_0 conda-forge setuptools 46.1.3 py37hc8dfbb8_0 conda-forge six 1.14.0 py_1 conda-forge sklearn 0.0 pypi_0 pypi slicerator 1.0.0 py_0 conda-forge slurm-magic 0.0.5 pypi_0 pypi sortedcontainers 2.1.0 py37_0 sparse 0.10.0 py_0 conda-forge sqlite 3.30.1 hcee41ef_0 conda-forge statsmodels 0.11.1 pypi_0 pypi tables 3.6.1 pypi_0 pypi tblib 1.6.0 py_0 tempita 0.5.3dev py37_1001 conda-forge terminado 0.8.3 py37hc8dfbb8_1 conda-forge testpath 0.4.4 py_0 conda-forge tifffile 2020.2.16 pypi_0 pypi tk 8.6.10 hed695b0_0 conda-forge toolz 0.10.0 py_0 tornado 6.0.4 py37h8f50634_1 conda-forge tqdm 4.45.0 pyh9f0ad1d_0 conda-forge traitlets 4.3.3 py37hc8dfbb8_1 conda-forge typing_extensions 3.7.4.1 py37_0 urllib3 1.25.9 pypi_0 pypi wcwidth 0.1.9 pyh9f0ad1d_0 conda-forge webcolors 1.11.1 pypi_0 pypi webencodings 0.5.1 py_1 conda-forge wheel 0.34.2 py_1 conda-forge widgetsnbextension 3.5.1 py37_0 conda-forge x264 1!152.20180806 h14c3975_0 conda-forge xmltodict 0.12.0 py_0 conda-forge xz 5.2.5 h516909a_0 conda-forge yaml 0.1.7 had09818_2 zeromq 4.3.2 he1b5a44_2 conda-forge zict 2.0.0 py_0 zipp 3.1.0 py_0 conda-forge zlib 1.2.11 h516909a_1006 conda-forge zstd 1.4.4 h6597ccf_3 conda-forge

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/4381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTHM7PD7QTQ3M65EJJTSVPO23ANCNFSM4VBZOHSA .

MaximilianHoffmann commented 3 years ago

Thank you both for your suggestions @quasiben after the update to 2020.12.0 the problem disappeared. @mrocklin thank your for your suggestion, I didn't manually set OMP_NUM_THREADS anywhere.