conda-forge / openblas-feedstock

A conda-smithy repository for openblas.
BSD 3-Clause "New" or "Revised" License
9 stars 38 forks source link

Illegal instruction when importing numpy #60

Closed ChristopherHogan closed 4 years ago

ChristopherHogan commented 5 years ago

Issue: Creating an environment with the latest numpy leads to a core dump on import:

$ conda create -n np -c conda-forge numpy
$ conda activate np
$ python -c 'import numpy'
Illegal instruction (core dumped)

Running it through gdb shows

Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007ffff5d16fd8 in sdot_k_SKYLAKEX ()
   from /home/chris/miniconda3/envs/np/lib/python3.7/site-packages/numpy/core/../../../../libopenblas.so.0

I'm running Ubuntu 16.04 in a VirtualBox VM on a Windows 10 host, with an Intel i7-7820X.
Environment (conda list):

``` $ conda list # packages in environment at /home/chris/miniconda3/envs/np: # # Name Version Build Channel blas 1.1 openblas conda-forge bzip2 1.0.6 h14c3975_1002 conda-forge ca-certificates 2018.11.29 ha4d7672_0 conda-forge certifi 2018.11.29 py37_1000 conda-forge libffi 3.2.1 hf484d3e_1005 conda-forge libgcc-ng 7.3.0 hdf63c60_0 conda-forge libgfortran-ng 7.2.0 hdf63c60_3 conda-forge libstdcxx-ng 7.3.0 hdf63c60_0 conda-forge ncurses 6.1 hf484d3e_1002 conda-forge numpy 1.16.1 py37_blas_openblash1522bff_0 [blas_openblas] conda-forge openblas 0.3.3 h9ac9557_1001 conda-forge openssl 1.0.2p h14c3975_1002 conda-forge pip 19.0.2 py37_0 conda-forge python 3.7.1 hd21baee_1000 conda-forge readline 7.0 hf8c457e_1001 conda-forge setuptools 40.8.0 py37_0 conda-forge sqlite 3.26.0 h67949de_1000 conda-forge tk 8.6.9 h84994c4_1000 conda-forge wheel 0.33.0 py37_0 conda-forge xz 5.2.4 h14c3975_1001 conda-forge zlib 1.2.11 h14c3975_1004 conda-forge ```


Details about conda and system ( conda info ):

``` $ conda info active environment : np active env location : /home/chris/miniconda3/envs/np shell level : 1 user config file : /home/chris/.condarc populated config files : conda version : 4.6.1 conda-build version : 3.17.8 python version : 3.6.8.final.0 base environment : /home/chris/miniconda3 (writable) channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/free/linux-64 https://repo.anaconda.com/pkgs/free/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /home/chris/miniconda3/pkgs /home/chris/.conda/pkgs envs directories : /home/chris/miniconda3/envs /home/chris/.conda/envs platform : linux-64 user-agent : conda/4.6.1 requests/2.18.4 CPython/3.6.8 Linux/4.15.0-45-generic ubuntu/16.04.5 glibc/2.23 UID:GID : 1000:1000 netrc file : None offline mode : False ```
ChristopherHogan commented 5 years ago

Still an issue after the BLAS migration, although now the segfault happens in libcblas.so.3 instead of libopenblas.so.0.

Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007ffff5cf25d8 in sdot_k_SKYLAKEX ()
   from /home/chris/miniconda3/envs/np/lib/python3.7/site-packages/numpy/core/../../../../libcblas.so.3

conda list

# Name                    Version                   Build  Channel
bzip2                     1.0.6             h14c3975_1002    conda-forge
ca-certificates           2019.3.9             hecc5488_0    conda-forge
certifi                   2019.3.9                 py37_0    conda-forge
libblas                   3.8.0                4_openblas    conda-forge
libcblas                  3.8.0                4_openblas    conda-forge
libffi                    3.2.1             he1b5a44_1006    conda-forge
libgcc-ng                 7.3.0                hdf63c60_0    conda-forge
libgfortran-ng            7.2.0                hdf63c60_3    conda-forge
liblapack                 3.8.0                4_openblas    conda-forge
libstdcxx-ng              7.3.0                hdf63c60_0    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
numpy                     1.16.2           py37h8b7e671_1    conda-forge
openblas                  0.3.5             h9ac9557_1001    conda-forge
openssl                   1.1.1b               h14c3975_1    conda-forge
pip                       19.0.3                   py37_0    conda-forge
python                    3.7.2                h381d211_0    conda-forge
readline                  7.0               hf8c457e_1001    conda-forge
setuptools                40.8.0                   py37_0    conda-forge
sqlite                    3.26.0            h67949de_1001    conda-forge
tk                        8.6.9             h84994c4_1000    conda-forge
wheel                     0.33.1                   py37_0    conda-forge
xz                        5.2.4             h14c3975_1001    conda-forge
zlib                      1.2.11            h14c3975_1004    conda-forge
jschueller commented 5 years ago

runs fine here on ubuntu, I have a much older cpu than skylake though (something around 2011)

ChristopherHogan commented 5 years ago

Yes, it seems specific to skylake.

ChristopherHogan commented 5 years ago

It works if I force openblas 0.3.4:

$ conda create -n np -c conda-forge openblas=0.3.4 numpy
$ conda activate np
$ python -c 'import numpy'
jschueller commented 5 years ago

maybe we should update openblas pin then https://github.com/conda-forge/conda-forge-pinning-feedstock/issues/201

isuruf commented 5 years ago

The new blas packages depend on openblas 0.3.5

jschueller commented 5 years ago

oh i thought the issue was with 0.3.3 but the log shows 0.3.5 as you say, so that's maybe a regression in 0.3.5

grlee77 commented 5 years ago

It sounds like this is probably the same issue seen in xianyi/OpenBLAS#2067? The issue seems to have been not accounting for the fact that VMs can disable some features of the underlying CPU. If this is the same issue then it has already been fixed in OpenBLAS master.

You can set an environment variable as in that thread to work around the issue in the meantime.

jschueller commented 5 years ago

it may be https://github.com/xianyi/OpenBLAS/pull/1949, one could try to backport it here

ChristopherHogan commented 5 years ago

Setting the environment variable works. The issue seems to be that VirtualBox incorrectly detects my CPU as an Intel i7-6700K (which as no AVX512) and OpenBLAS correctly detects an i7-6820X (which has AVX512).

prusswan commented 5 years ago

Setting the environment variable works. The issue seems to be that VirtualBox incorrectly detects my CPU as an Intel i7-6700K (which as no AVX512) and OpenBLAS correctly detects an i7-6820X (which has AVX512).

How do you check this? Mine is a 7900X, so almost certainly the same issue

ChristopherHogan commented 5 years ago

Got to Machine->Show Log, and filter for "CPUM".

prusswan commented 5 years ago

Confirmed. Just for the record, this solution (overriding environment variable) allows numpy to be loaded despite the wrong CPU detection:

>>> import os
>>> os.environ["OPENBLAS_CORETYPE"] = "nehalem"
>>> import numpy as np
>>>

I suppose the other solution is to downgrade to openblas < 0.3.5

conda install openblas=0.3.4 

I am going with the second solution since it is cleaner and I don't need the latest openblas (was on a much older version of openblas anyway, before whatever that triggered the upgrade of openblas)

1kastner commented 5 years ago

I added an environment variable to my Dockerfile in order to keep the code clean but in the end one must decide case-by-case about how to deal with that I guess.

isuruf commented 4 years ago

Looks like we can't do anything here. Please open an issue upstream if the issue is still there