Closed gdonval closed 3 years ago
Create an MKL environment: conda create -n mkl numpy mkl Create a BLIS environment: conda create -n blis numpy blis nomkl Create an OpenBLAS environment: conda create -n openblas numpy openblas nomkl
This is not the correct way. Please see our docs on how to switch blas implementation.
What are you talking about?
The point is not how to switch implementations in the most comfortable way (feel free to use whichever method you prefer to switch).
The point is about this OpenBLAS being much slower than BLIS, which is not how things used to be.
The point is not how to switch implementations in the most comfortable way
I didn't say it was comfortable or not. I said it's not correct which means it's wrong. conda list
output you showed has the following,
libblas 3.9.0 5_h92ddd45_netlib conda-forge
libcblas 3.9.0 5_h92ddd45_netlib conda-forge
which means that you are not using openblas and using netlib's reference lapack which is slow. You have both netlib and openblas installed, but numpy is using the netlib one.
Please use the recommended way to switch blas implementation and you'll be able to get an environment where numpy uses openblas.
Why can't openblas require/pull the correct libblas
?
Well at least I suppose this solves this specific bug request though it sounds like improper liblas versions should be made to conflict with mismatching BLAS implementations.
Issue
OpenBLAS is suspiciously slow in numpy (order of magnitude slower than both BLIS and MKL, on an AMD 3950x!).
Steps
conda create -n mkl numpy mkl
conda create -n blis numpy blis nomkl
conda create -n openblas numpy openblas nomkl
$ OMP_NUM_THREADS=1 BLIS_NUM_THREADS=1 MKL_NUM_THREADS=1 jupyter lab
I checked that CPU usage never exceeded 100.0 in
top
in all cases, throughout the full benchmark, until the very end.Result
Last point is around 25s in both MKL and BLIS; it is 3min30s in OpenBLAS. Last time I did something similar, OpenBLAS was on par with MKL. Again I insist: CPU usage was capped at 100% in all cases, there is no underlying multithreading here.
Conda environment
Environment (
conda list
):Full list here:
Details about
conda
and system (conda info
):