Open ll-portes opened 5 years ago
I encountered the very same bug.
A simple import of numpy is often sufficient to trigger this bug.
@schmidtchristoph I installed the Pyviz environment, and I got no bug. Unfortunately, the problem returned after installing scikit-learning on this env. Hence, my pragmatic solution now is to have a clone PyViz env to make my work. If I need to install something, I clone the env again and install the package. If this installation breaks my solution, I return to the original env. Maybe this approach could help you.
@oleksandr-pavlyk do you have any ideas on this one?
I will try to locate the machine to triage the issue.
If the reporter is willing to try a few things, please try whether the following helps:
MKL_ENABLE_INSTRUCTIONS=AVX2 python script.py # MKL will avoid AVX512 instructions
or
MKL_THREADING_LAYER=SEQUENTIAL python script.py # MKL will use one core only
Both commands help to avoid the system freeze and subsequent reboot.
I used the dot product example of @ll-portes as "script.py". This example would reproduce the bug if no MKL_*
instructions are prepended.
Thank you very much.
I will try to locate the machine to triage the issue.
If the reporter is willing to try a few things, please try whether the following helps:
MKL_ENABLE_INSTRUCTIONS=AVX2 python script.py # MKL will avoid AVX512 instructions
or
MKL_THREADING_LAYER=SEQUENTIAL python script.py # MKL will use one core only
None of the commands above worked. The machine froze/rebooted again.
I don't know if this information helps, but under the PyViz environment, everything works fine. This environment was created using:
conda update conda
conda create -n pyviz-tutorial python=3.6
After activating it, I ran the conda update --all
.
Note: after that, I cloned this same env under the different name "PyVizEnv", which appears below. On this PyVizEnv environment, the outputs for conda are:
conda info
conda list --show-channel-urls
@ll-portes Please note that the pyviz-tutorial
environment has
blas 1.0 mkl defaults
whereas the (I assumed, the problematic) environment in your original post has
blas 1.0 openblas defaults
If the numpy
in the problematic environment has been linked against mkl
, rather than openblas
that could be a problem. You can check that by activating the problematic environment and running (suitably adjusted for your python version, I am using Python 3.6):
ldd $CONDA_PREFIX/lib/python3.6/site-packages/numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so
Alternatively you can inspect the content of $CONDA_PREFIX/lib/python3.6/site-packages/numpy/distutils/site.cfg
.
Please try to see if the crash persists in a simpler environment:
conda create -n t_i10832 -c defaults --override-channels numpy
conda activate t_i10832
python script.py
Thank you @oleksandr-pavlyk The whole story is a little bit confusing, but I'll try my best to explain. First, let me show the results using the t_i10832 env. I always test two scripts:
script_dot.py is:
import numpy as np
print(np.__version__)
A = np.matrix([[1.], [3.]]); B = np.matrix([[2., 3.]])
C=np.dot(A, B)
print(C.shape)
script_svd.py is:
import numpy as np
print(np.__version__)
nr=1000;nc=10000
X=np.random.rand(nr,nc)
print(X.shape)
u,s,vt=np.linalg.svd(X)
print(u.shape)
The results with t_i10832 env are:
numpy.dot
passed the test!numpy.linalg.svd
doesn't, so the problem persists.These are the outputs for the t_i10832 env:
conda info
conda list --show-channel-urls
Now, the little bit confusing story. Originally, I found this problem with:
numpy.linalg.svd
Then, I tryied "combinations" of: a. updating/downgrading conda, anaconda, mkl. b. nonmkl versions of numpy (scipy etc). c. remark: even the Anaconda 2019.03 had this problem.
The "solution" for me was to use the same approch in this (link), on which the example code was with numpy.dot
(so, just for consistence, I started to report the problem with np.dot
instead of np.linalg.svd
, since both crashed my computer. But I always test the solutions with both scripts) . Specifically, this partial solution was:
i. install nonmkl versions of numpy (scipy etc).
ii. use os.environ['OPENBLAS_CORETYPE']='Haswell'
Now, using t_i10832 env, this was the first time that one code worked (dot) and the other doesn't (svd).
So, since the beginning, the problem was with Numpy + blas +mkl (original Anaconda installation), and the problem persisted using openblas (but with it, at least the os.environ
thing allowed me to use the machine). I found here on my PC the following information from my first trials in solving this, but with no success, by updating things (they are outputs of conda list, but I saved just info regarding blas and mkl):
blas 1.0 mkl
mkl 2019.1 144
mkl-service 1.1.2 py37he904b0f_5
mkl_fft 1.0.6 py37hd81dba3_0
mkl_random 1.0.2 py37hd81dba3_0
blas 1.0 mkl
mkl 2019.3 199
mkl-service 1.1.2 py37he904b0f_5
mkl_fft 1.0.10 py37ha843d7b_0
mkl_random 1.0.2 py37hd81dba3_0
Sorry, I clicked the "close and comment" instead of "Comment" by mistake.
@ll-portes Thank you for trying this. So we established that the environment is consistent, but a call to SVD is causing trouble.
Unfortunately I was not able to get ahold of the machine with the processor you are using yet, so I have to ask you to try different things in the hope to triage the problem further.
So first question, is the numpy_svd.py
script working for smaller matrix sizes ?
If you further install scipy from the defaults channels into your environment with conda install -n t_i10832 -c defaults --override-channels scipy
, and try a different Lapack driver to solve the SVD specified via scipy.linalg.svd
:
u3, s3, vt3 = scipy.linalg.svd(X, lapack_driver='gesvd')
does the problem go away?
In your experiments, please fix the random seed to ensure reproducibility on our side:
np.random.seed(42)
X = np.random.rand(nr, nc)
@ll-portes I was finally able to secure access to the hardware, but I am unable to reproduce any problems:
(numpy) C:\Users\user>ipython
Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import numexpr, numpy as np, mkl_random, mkl
In [2]: from numexpr.cpuinfo import cpu
In [3]: len(cpu.info)
Out[3]: 36
In [4]: cpu.info[0]['ProcessorNameString']
Out[4]: 'Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz'
In [5]: mkl.get_version_string()
Out[5]: 'Intel(R) Math Kernel Library Version 2019.0.3 Product Build 20190125 for Intel(R) 64 architecture applications'
In [6]: np.__version__
Out[6]: '1.16.3'
In [7]: nr, nc = 1000, 10**4
In [8]: XX = mkl_random.randn(nr, nc)
In [9]: U, S, Vt = np.linalg.svd(XX)
In [10]: (U.shape, S.shape, Vt.shape)
Out[10]: ((1000, 1000), (1000,), (10000, 10000))
In [11]: X = mkl_random.randn(nc, nr)
In [12]: U, S, Vt = np.linalg.svd(X)
In [13]: (U.shape, S.shape, Vt.shape)
Out[13]: ((10000, 10000), (1000,), (1000, 1000))
In [14]: quit
``` active environment : numpy active env location : C:\Users\user\Miniconda3\envs\numpy shell level : 2 user config file : C:\Users\user\.condarc populated config files : conda version : 4.6.14 conda-build version : not installed python version : 3.7.3.final.0 base environment : C:\Users\user\Miniconda3 (writable) channel URLs : https://repo.anaconda.com/pkgs/main/win-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/free/win-64 https://repo.anaconda.com/pkgs/free/noarch https://repo.anaconda.com/pkgs/r/win-64 https://repo.anaconda.com/pkgs/r/noarch https://repo.anaconda.com/pkgs/msys2/win-64 https://repo.anaconda.com/pkgs/msys2/noarch package cache : C:\Users\user\Miniconda3\pkgs C:\Users\user\.conda\pkgs C:\Users\user\AppData\Local\conda\conda\pkgs envs directories : C:\Users\user\Miniconda3\envs C:\Users\user\.conda\envs C:\Users\user\AppData\Local\conda\conda\envs platform : win-64 user-agent : conda/4.6.14 requests/2.21.0 CPython/3.7.3 Windows/10 Windows/10.0.17134 administrator : True netrc file : None offline mode : False ```
```text
# This file may be used to create an environment using:
# $ conda create --name
Hence it does not seem to be a problem in the build of NumPy, in the Intel(R) MKL itself.
The machine I used ran Windows 10, had 1 socket, 18 cores and hyperthreading on with 2 threads per core.
My realm of expertise ends here, but if you happened to overclock the processor, please try to run the workload in the normal mode.
Actual Behavior
Hi! I reported this issue to the Numpy developers (link), and they asked me to report here as well.
Several simple codes (eg, SVD, np.dot) from the Numpy package built by Anaconda make one of our computers completely freezes for 5-10 seconds and then reboot. This happens only on a machine with Intel i9-7980XE 18 cores cpu. The same Conda/Ubuntu environment with an i7-7700 4 cores cpu has no problems. This happens on the python command line and on a Jupyter notebook.
Remarks: 1) This issue doesn’t happen if we use Python+numpy from pip (test suggested by the Numpy team). 2) After reading a report of a similar issue with a VM (link), the solution that worked for us was :
And then using the following code before importing Numpy:
3) Please, I’d like to enphisize that ours is not a VM. 4) Last week we tried to run a computation with Igraph on this machine (for the first time on this specific machine), and we got again the freeze/reboot (probably because we didn’t installed the nomkl version of Igraph).
4) I can run any test you suggest on the aforementioned machine. But I'm able to just "copy/paste" the commands because I don't have any more profound knowledge as you guys have (I even had no idea about the difference between BLAS and openBLAS, MKL etc before this problem).
Expected Behavior
The computation been done, with no freeze/reboot when using the desktop with Intel i9-7980XE 18 cores cpu.
Steps to Reproduce
Only on our machine with Intel i9-7980XE 18 cores cpu:
Remark: the same problem happens with SVD.
Anaconda or Miniconda version:
Anaconda3-2019.03-Linux-x86_64.sh
But in the first week of the issue, we tried other versions (even for Python 2) from the last year (2018), but nothing worked as expected.
Operating System:
Ubuntu 18.04.2 LTS
conda info
conda list --show-channel-urls