conda-forge / openblas-feedstock

A conda-smithy repository for openblas.
BSD 3-Clause "New" or "Revised" License
9 stars 39 forks source link

OpenBLAS is suspiciously slow (wrt. BLIS/MKL on AMD) #125

Closed gdonval closed 3 years ago

gdonval commented 3 years ago

Issue

OpenBLAS is suspiciously slow in numpy (order of magnitude slower than both BLIS and MKL, on an AMD 3950x!).

Steps

import numpy as np
sizes = (1, 2, 3, 4, 32, 64, 127, 128, 129, 1023, 1024, 1025, 4096, 4096*2-1, 4096*2, 4096*2+1)
best_times = np.zeros(len(sizes))
for i, s in enumerate(sizes):
    arr = np.random.rand(s, s)
    arrT = np.random.rand(s, s)
    t = %timeit -o arr @ arrT
    best_times[i] = t.best

I checked that CPU usage never exceeded 100.0 in top in all cases, throughout the full benchmark, until the very end.

Result

image

Last point is around 25s in both MKL and BLIS; it is 3min30s in OpenBLAS. Last time I did something similar, OpenBLAS was on par with MKL. Again I insist: CPU usage was capped at 100% in all cases, there is no underlying multithreading here.

Conda environment


Environment (conda list):

$ conda list
[...]
openblas                  0.3.17          pthreads_h4748800_0    conda-forge
[...]

Full list here:

``` $ conda list # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge alsa-lib 1.2.3 h516909a_0 conda-forge anyio 3.3.0 py39hf3d152e_0 conda-forge argon2-cffi 20.1.0 py39h3811e60_2 conda-forge async_generator 1.10 py_0 conda-forge atk-1.0 2.36.0 h3371d22_4 conda-forge attrs 21.2.0 pyhd8ed1ab_0 conda-forge babel 2.9.1 pyh44b312d_0 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge bleach 3.3.1 pyhd8ed1ab_0 conda-forge brotlipy 0.7.0 py39h3811e60_1001 conda-forge ca-certificates 2021.5.30 ha878542_0 conda-forge cairo 1.16.0 h6cf1ce9_1008 conda-forge certifi 2021.5.30 py39hf3d152e_0 conda-forge cffi 1.14.6 py39he32792d_0 conda-forge chardet 4.0.0 py39hf3d152e_1 conda-forge charset-normalizer 2.0.0 pyhd8ed1ab_0 conda-forge colorama 0.4.4 pyh9f0ad1d_0 conda-forge cryptography 3.4.7 py39hbca0aa6_0 conda-forge cycler 0.10.0 py_2 conda-forge dbus 1.13.6 h48d8840_2 conda-forge debugpy 1.4.1 py39he80948d_0 conda-forge decorator 5.0.9 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge entrypoints 0.3 pyhd8ed1ab_1003 conda-forge expat 2.4.1 h9c3ff4c_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.13.1 hba837de_1005 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge freetype 2.10.4 h0708190_1 conda-forge fribidi 1.0.10 h36c2ea0_0 conda-forge gdk-pixbuf 2.42.6 h04a7f16_0 conda-forge gettext 0.19.8.1 h0b5b191_1005 conda-forge giflib 5.2.1 h36c2ea0_2 conda-forge glib 2.68.3 h9c3ff4c_0 conda-forge glib-tools 2.68.3 h9c3ff4c_0 conda-forge graphite2 1.3.13 h58526e2_1001 conda-forge graphviz 2.48.0 h85b4f2f_0 conda-forge gst-plugins-base 1.18.4 hf529b03_2 conda-forge gstreamer 1.18.4 h76c114f_2 conda-forge gtk2 2.24.33 h539f30e_1 conda-forge gts 0.7.6 h64030ff_2 conda-forge harfbuzz 2.8.2 h83ec7ef_0 conda-forge icc_rt 2020.2 intel_254 numba icu 68.1 h58526e2_0 conda-forge idna 3.1 pyhd3deb0d_0 conda-forge importlib-metadata 4.6.1 py39hf3d152e_0 conda-forge ipykernel 6.0.3 py39hef51801_0 conda-forge ipython 7.25.0 py39hef51801_1 conda-forge ipython_genutils 0.2.0 py_1 conda-forge jbig 2.1 h7f98852_2003 conda-forge jedi 0.18.0 py39hf3d152e_2 conda-forge jinja2 3.0.1 pyhd8ed1ab_0 conda-forge jpeg 9d h36c2ea0_0 conda-forge json5 0.9.5 pyh9f0ad1d_0 conda-forge jsonschema 3.2.0 pyhd8ed1ab_3 conda-forge jupyter_client 6.1.12 pyhd8ed1ab_0 conda-forge jupyter_core 4.7.1 py39hf3d152e_0 conda-forge jupyter_server 1.10.1 pyhd8ed1ab_0 conda-forge jupyterlab 3.0.16 pyhd8ed1ab_0 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge jupyterlab_server 2.6.1 pyhd8ed1ab_0 conda-forge kiwisolver 1.3.1 py39h1a9c180_1 conda-forge krb5 1.19.1 hcc1bbae_0 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.36.1 hea4e1c9_1 conda-forge lerc 2.2.1 h9c3ff4c_0 conda-forge libblas 3.9.0 5_h92ddd45_netlib conda-forge libcblas 3.9.0 5_h92ddd45_netlib conda-forge libclang 11.1.0 default_ha53f305_1 conda-forge libdeflate 1.7 h7f98852_5 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libevent 2.1.10 hcdb4288_3 conda-forge libffi 3.3 h58526e2_2 conda-forge libgcc-ng 11.1.0 hc902ee8_2 conda-forge libgd 2.3.2 h78a0170_0 conda-forge libgfortran-ng 11.1.0 h69a702a_0 conda-forge libgfortran5 11.1.0 h6c583b3_0 conda-forge libglib 2.68.3 h3e27bee_0 conda-forge libgomp 11.1.0 hc902ee8_2 conda-forge libiconv 1.16 h516909a_0 conda-forge liblapack 3.9.0 5_h92ddd45_netlib conda-forge libllvm11 11.1.0 hf817b99_2 conda-forge libogg 1.3.4 h7f98852_1 conda-forge libopenblas 0.3.17 pthreads_h8fe5266_0 conda-forge libopus 1.3.1 h7f98852_1 conda-forge libpng 1.6.37 h21135ba_2 conda-forge libpq 13.3 hd57d9b9_0 conda-forge librsvg 2.50.7 hc3c00ef_0 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libstdcxx-ng 11.1.0 h56837e0_2 conda-forge libtiff 4.3.0 hf544144_1 conda-forge libtool 2.4.6 h58526e2_1007 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libvorbis 1.3.7 h9c3ff4c_0 conda-forge libwebp 1.2.0 h3452ae3_0 conda-forge libwebp-base 1.2.0 h7f98852_2 conda-forge libxcb 1.13 h7f98852_1003 conda-forge libxkbcommon 1.0.3 he3ba5ed_0 conda-forge libxml2 2.9.12 h72842e0_0 conda-forge llvmlite 0.37.0rc2 py39hf484d3e_0 numba lz4-c 1.9.3 h9c3ff4c_0 conda-forge markupsafe 2.0.1 py39h3811e60_0 conda-forge matplotlib 3.4.2 py39hf3d152e_0 conda-forge matplotlib-base 3.4.2 py39h2fa2bec_0 conda-forge matplotlib-inline 0.1.2 pyhd8ed1ab_2 conda-forge mistune 0.8.4 py39h3811e60_1004 conda-forge mysql-common 8.0.25 ha770c72_2 conda-forge mysql-libs 8.0.25 hfa10184_2 conda-forge nbclassic 0.3.1 pyhd8ed1ab_1 conda-forge nbclient 0.5.3 pyhd8ed1ab_0 conda-forge nbconvert 6.1.0 py39hf3d152e_0 conda-forge nbformat 5.1.3 pyhd8ed1ab_0 conda-forge ncurses 6.2 h58526e2_4 conda-forge nest-asyncio 1.5.1 pyhd8ed1ab_0 conda-forge nomkl 1.0 h5ca1d4c_0 conda-forge notebook 6.4.0 pyha770c72_0 conda-forge nspr 4.30 h9c3ff4c_0 conda-forge nss 3.67 hb5efdd6_0 conda-forge numba 0.54.0rc1 np1.16py3.9hc547734_g9bed2ebb2_0 numba numpy 1.21.1 py39hdbf815f_0 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openblas 0.3.17 pthreads_h4748800_0 conda-forge openjpeg 2.4.0 hb52868f_1 conda-forge openssl 1.1.1k h7f98852_0 conda-forge packaging 21.0 pyhd8ed1ab_0 conda-forge pandoc 2.14.1 h7f98852_0 conda-forge pandocfilters 1.4.2 py_1 conda-forge pango 1.48.7 hb8ff022_0 conda-forge parso 0.8.2 pyhd8ed1ab_0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pexpect 4.8.0 pyh9f0ad1d_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 8.3.1 py39ha612740_0 conda-forge pip 21.2.1 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge prometheus_client 0.11.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.19 pyha770c72_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pycparser 2.20 pyh9f0ad1d_2 conda-forge pygments 2.9.0 pyhd8ed1ab_0 conda-forge pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pyqt 5.12.3 py39hf3d152e_7 conda-forge pyqt-impl 5.12.3 py39h0fcd23e_7 conda-forge pyqt5-sip 4.19.18 py39he80948d_7 conda-forge pyqtchart 5.12 py39h0fcd23e_7 conda-forge pyqtwebengine 5.12.1 py39h0fcd23e_7 conda-forge pyrsistent 0.17.3 py39h3811e60_2 conda-forge pysocks 1.7.1 py39hf3d152e_3 conda-forge python 3.9.6 h49503c6_1_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python_abi 3.9 2_cp39 conda-forge pytz 2021.1 pyhd8ed1ab_0 conda-forge pyyaml 5.4.1 py39h3811e60_0 conda-forge pyzmq 22.1.0 py39h37b5a0c_0 conda-forge qt 5.12.9 hda022c4_4 conda-forge readline 8.1 h46c0cb4_0 conda-forge requests 2.26.0 pyhd8ed1ab_0 conda-forge requests-unixsocket 0.2.0 py_0 conda-forge roctools 0.0.0 hf484d3e_1 numba scipy 1.7.0 py39hee8e79c_1 conda-forge send2trash 1.7.1 pyhd8ed1ab_0 conda-forge setuptools 49.6.0 py39hf3d152e_3 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge sniffio 1.2.0 py39hf3d152e_1 conda-forge sqlite 3.36.0 h9cd32fc_0 conda-forge tbb 2021.1.1 intel_119 numba terminado 0.10.1 py39hf3d152e_0 conda-forge testpath 0.5.0 pyhd8ed1ab_0 conda-forge tk 8.6.10 h21135ba_1 conda-forge tornado 6.1 py39h3811e60_1 conda-forge traitlets 5.0.5 py_0 conda-forge tzdata 2021a he74cb21_1 conda-forge urllib3 1.26.6 pyhd8ed1ab_0 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge websocket-client 0.57.0 py39hf3d152e_4 conda-forge wheel 0.36.2 pyhd3deb0d_0 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.0.10 h7f98852_0 conda-forge xorg-libsm 1.2.3 hd9c2040_1000 conda-forge xorg-libx11 1.7.2 h7f98852_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h7f98852_1 conda-forge xorg-libxrender 0.9.10 h7f98852_1003 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.5 h516909a_1 conda-forge yaml 0.2.5 h516909a_0 conda-forge zeromq 4.3.4 h9c3ff4c_0 conda-forge zipp 3.5.0 pyhd8ed1ab_0 conda-forge zlib 1.2.11 h516909a_1010 conda-forge zstd 1.5.0 ha95c52a_0 conda-forge ```


Details about conda and system ( conda info ):

``` $ conda info active environment : base active env location : /home/user/Documents/Programming/Toolchains/miniconda3 shell level : 1 user config file : /home/user/.condarc populated config files : conda version : 4.10.3 conda-build version : not installed python version : 3.8.10.final.0 virtual packages : __linux=5.13.4=0 __glibc=2.33=0 __unix=0=0 __archspec=1=x86_64 base environment : /home/user/Documents/Programming/Toolchains/miniconda3 (writable) conda av data dir : /home/user/Documents/Programming/Toolchains/miniconda3/etc/conda conda av metadata url : None channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /home/user/Documents/Programming/Toolchains/miniconda3/pkgs /home/user/.conda/pkgs envs directories : /mnt/scratch/user/Programming/Conda/envs /home/user/Documents/Programming/Toolchains/miniconda3/envs /home/user/.conda/envs platform : linux-64 user-agent : conda/4.10.3 requests/2.25.1 CPython/3.8.10 Linux/5.13.4-arch2-1 arch/ glibc/2.33 UID:GID : 1000:1000 netrc file : None offline mode : False ```
isuruf commented 3 years ago

Create an MKL environment: conda create -n mkl numpy mkl Create a BLIS environment: conda create -n blis numpy blis nomkl Create an OpenBLAS environment: conda create -n openblas numpy openblas nomkl

This is not the correct way. Please see our docs on how to switch blas implementation.

gdonval commented 3 years ago

What are you talking about?

The point is not how to switch implementations in the most comfortable way (feel free to use whichever method you prefer to switch).

The point is about this OpenBLAS being much slower than BLIS, which is not how things used to be.

isuruf commented 3 years ago

The point is not how to switch implementations in the most comfortable way

I didn't say it was comfortable or not. I said it's not correct which means it's wrong. conda list output you showed has the following,

libblas                   3.9.0           5_h92ddd45_netlib    conda-forge
libcblas                  3.9.0           5_h92ddd45_netlib    conda-forge

which means that you are not using openblas and using netlib's reference lapack which is slow. You have both netlib and openblas installed, but numpy is using the netlib one.

Please use the recommended way to switch blas implementation and you'll be able to get an environment where numpy uses openblas.

gdonval commented 3 years ago

Why can't openblas require/pull the correct libblas?

gdonval commented 3 years ago

Well at least I suppose this solves this specific bug request though it sounds like improper liblas versions should be made to conflict with mismatching BLAS implementations.