Closed ollie-bell closed 2 years ago
Conda-forge can only really depend on things it can distribute itself (with some exceedingly rare exceptions that are both rock-stable and also difficult/dangerous to replace; e.g. glibc). Since the accelerate sources are not available, and I don't think the binaries are redistributable (or ABI-stable enough without further metadata), I'm doubtful whether this is feasible.
@isuruf might have a more definite answer.
That's not the issue. The issue is that Accelerate's lapack version is 3.2.1 which is ancient. Therefore Accelerate can only provide blas which means only numpy can support it. scipy and many other packages cannot.
That's interesting. I guess that reflects Apple's lack of support for Fortran. But yes, now that you mention it, the SciPy docs also say the same.
Isuru found a way around this: https://github.com/conda-forge/blas-feedstock/pull/82
In 2-3 hours, you should be able to try installing conda install numpy libblas=*=*accelerate
. :)
I'll start testing this in #252 once it becomes available as well...
Seems to be broken at the moment. I created a fresh environment with Python 3.9 and conda install numpy "libblas=*=*accelerate"
Python 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:06)
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
zsh: killed python
@ollie-bell would suggest sharing the full list of packages from the environment in case there is a clue in there. Also would include conda info
. Is this using Intel or ARM?
conda info
and conda list
are below. I'm on an M1 MacBook, so ARM.
active environment : test
active env location : /Users/oliver/miniforge3/envs/test
shell level : 3
user config file : /Users/oliver/.condarc
populated config files : /Users/oliver/miniforge3/.condarc
/Users/oliver/.condarc
conda version : 4.11.0
conda-build version : not installed
python version : 3.9.9.final.0
virtual packages : __osx=12.1=0
__unix=0=0
__archspec=1=arm64
base environment : /Users/oliver/miniforge3 (writable)
conda av data dir : /Users/oliver/miniforge3/etc/conda
conda av metadata url : None
channel URLs : https://conda.anaconda.org/conda-forge/osx-arm64
https://conda.anaconda.org/conda-forge/noarch
package cache : /Users/oliver/miniforge3/pkgs
/Users/oliver/.conda/pkgs
envs directories : /Users/oliver/miniforge3/envs
/Users/oliver/.conda/envs
platform : osx-arm64
user-agent : conda/4.11.0 requests/2.27.1 CPython/3.9.9 Darwin/21.2.0 OSX/12.1
UID:GID : 501:20
netrc file : None
offline mode : False
# Name Version Build Channel
bzip2 1.0.8 h3422bc3_4 conda-forge
ca-certificates 2021.10.8 h4653dfc_0 conda-forge
libblas 3.9.0 12_osxarm64_accelerate conda-forge
libcblas 3.9.0 12_osxarm64_accelerate conda-forge
libcxx 12.0.1 h168391b_1 conda-forge
libffi 3.4.2 h3422bc3_5 conda-forge
libgfortran 5.0.0.dev0 11_0_1_hf114ba7_23 conda-forge
libgfortran5 11.0.1.dev0 hf114ba7_23 conda-forge
liblapack 3.9.0 12_osxarm64_accelerate conda-forge
libzlib 1.2.11 hee7b306_1013 conda-forge
llvm-openmp 12.0.1 hf3c4609_1 conda-forge
ncurses 6.2 h9aa5885_4 conda-forge
numpy 1.22.0 py39h61a45d2_0 conda-forge
openssl 3.0.0 h3422bc3_2 conda-forge
pip 21.3.1 pyhd8ed1ab_0 conda-forge
python 3.9.9 h43b31ca_0_cpython conda-forge
python_abi 3.9 2_cp39 conda-forge
readline 8.1 hedafd6a_0 conda-forge
setuptools 60.5.0 py39h2804cbe_0 conda-forge
sqlite 3.37.0 h72a2b83_0 conda-forge
tk 8.6.11 he1e0b03_1 conda-forge
tzdata 2021e he74cb21_0 conda-forge
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
xz 5.2.5 h642e427_1 conda-forge
zlib 1.2.11 hee7b306_1013 conda-forge
On osx-x86, the test suite runs in #252 through without failures. However, osx-arm64 (as for @ollie-bell) isn't being tested (only cross-compiled) because we don't have the hardware for it in CI.
Does this issue show if you use netlib
instead of accelerate
?
@jakirkham the issue only shows with accelerate
. Note that I can successfully install numpy with accelerate via pip, and it does show ~2x speed-up in the numpy linalg benchmark over the default conda-forge numpy.
Should be fixed now
@ollie-bell could you report back on the performance please? Do you still see any performance improvement?
@ngam yes indeed. When I run the numpy linalg benchmarks I typically see 2-4x speed up with the new accelerate installation. I assume those benchmarks are robust and representative of real use!
Hi! I landed on this issue after reading this post (see also a similar SE post). There, a simple SVD benchmark demonstrates more than 4x speedup when vecLib
is used, but the suggested answer involves compiling from sources.
I'm hoping for a cleaner conda
-based solution which would work in a fresh miniforge3
environment on Apple Silicon. Has the work on this issue provided such a solution officially as
conda install numpy "libblas=*=*accelerate"
?
Do you regard this approach as stable in the long term? And will running conda install numpy
(without specifying which BLAS) eventually lead to this behaviour by default?
Finally, what about related libraries such as scipy
and scikit-learn
?
And will running
conda install numpy
(without specifying BLAS) eventually lead to this behaviour by default?
Currently, OpenBLAS is the default for this type of switching, unless the maintainers decide otherwise to make Accelerate (vecLib) the default for OSX: https://conda-forge.org/docs/maintainer/knowledge_base.html#blas
Finally, what about related libraries such as scipy and scikit-learn?
I believe SciPy dropped support for Accelerate (vecLib) completely: https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate
Just note that enforcing "libblas=*=*accelerate"
will act on the entire environment and will likely kill some hardcoded libraries, e.g. PyTorch (link), but we are working on fixing that. For PyTorch that happens because it is hardcoded to OpenBLAS.
I think this type of comment should push us to think of a global solution, where all maintainers are encouraged to link against netlib (cc @isuruf) instead of any other BLAS. And perhaps we could make it such that on osx-arm64 we default to Accelerate instead of OpenBLAS. But that is beyond me personally...
Thanks @ngam for the great response!
Just note that enforcing "libblas==accelerate" will act on the entire environment and will likely kill some hardcoded libraries, e.g. PyTorch (link), but we are working on fixing that. For PyTorch that happens because it is hardcoded to OpenBLAS.
This is particularly useful (if somewhat painful) to know. Does what you say about SciPy
imply that similar pains can be expected when trying to add SciPy
to the same environment? Sorry for the dumb questions, this is all way above my pay grade.
SciPy seems to be fine, I just tested it, because they don't hard-code to OpenBLAS. I have to admit, I am also still new to this and trying to better understand it.
@isuruf, is it correct that in the case of scipy, even if they don't support Accelerate, when enforcing "libblas=*=*accelerate"
, it will just use OpenBLAS? Or are we overriding their decision essentially and linking to Accelerate anyway?
No, it will use Accelerate. That's why I said it's experimental and is not recommended.
I think this type of comment should push us to think of a global solution, where all maintainers are encouraged to link against netlib (cc @isuruf) instead of any other BLAS.
I don't know where you got this idea that this is not encouraged. In fact, almost all packages link against netlib. Only exceptions I know are pytorch and tensorflow.
I don't know where you got this idea that this is not encouraged. In fact, almost all packages link against netlib. Only exceptions I know are pytorch and tensorflow.
I wasn't aware --- I saw a number of packages link explicitly against openblas (e.g. julia, but that's maybe because that's an upstream decision?)
Yes, julia is another exception because upstream requires ILP64 variants which are not available in other implementations.
Let me see if we/I can run the test suite of scipy with accelerate.
I'll save you the trouble. There are segfaults.
Thanks @isuruf for your clarifications! Are there issues/PRs relevant to the SciPy front that I and other readers can follow? And, for the time being, is there a respectable solution in which SciPy et al are linked to OpenBLAS while NumPy is linked to accelerate?
You can try building numpy from source.
You can try building numpy from source.
Indeed. And I already have, successfully. I guess I am curious to know about the PyData ecosystem's roadmap concerning these accelerations. The SciPy reasons for dropping support seem pretty damning, so I wonder if there are any reasons to be optimistic about benefitting from out-of-the-box accelerations using conda-forge in the future.
You can try building numpy from source.
Indeed. And I already have, successfully. I guess I am curious to know about the PyData ecosystem's roadmap concerning these accelerations. The SciPy reasons for dropping support seem pretty damning, so I wonder if there are any reasons to be optimistic about benefitting from out-of-the-box accelerations using conda-forge in the future.
SciPy specifically has faced some issues with M1 (at least their PyPI wheels: https://mail.python.org/archives/list/scipy-dev@python.org/thread/LLN2O4G2XI2MPILRW2XRRVCUK336WGKF/). There might be a more comprehensive solution soon... so yes, be optimistic!
I would generally say, the whole M1 thing (I have two machines myself) is experimental and so patience is needed as developers refine stuff. If scientific computing performance is truly critical, I wouldn't expect it to be done on personal M1 machines anyway (more like HPC). I think OpenBLAS (at least through conda-forge) is definitely good enough for now; yes, we can get better performance, but we will have to wait a little longer :)
@ngam SciPy's problems with Apple's Accelerate greatly pre-date M1: https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate
Yes, @dopplershift. Sorry if I made it sound like it was an M1-only issue. But the point is, on Intel Macs, one could use MKL BLAS which outperforms Accelerate BLAS. However, on M1 machines, Accelerate BLAS outperforms OpenBLAS.
rgommers pointed that same exact link to me 😄
(I am not sure if SciPy actually supports MKL or not; I've been only focusing on the Accelerate issue on M1 Macs.)
Should add it is probably not too surprising that Accelerate outperforms OpenBLAS on M1 given that OpenBLAS hasn't been tuned for that architecture ( https://github.com/xianyi/OpenBLAS/issues/2814 ). It is possible that changes after that work happens
SciPy does support MKL. And ATLAS, and BLIS.
Yeah we do have support for BLIS. Though it doesn't appear to be migrated yet. Added here ( https://github.com/conda-forge/conda-forge-pinning-feedstock/pull/2444 ).
Seems like there was some work upstream for M1, but Idk to what extent that has been included in releases.
Is there a documentation how to achieve this?
According to https://gist.github.com/MarkDana/a9481b8134cf38a556cf23e1e815dafb it seems the support for Accelerate was dropped.
I did, I set my yaml
as:
channels:
- conda-forge
dependencies:
- python=3.11
- blas*=*accelerate
- libblas=*=*accelerate
- numpy
- scipy
- pandas
- scikit-learn
I ran the tests of https://gist.github.com/MarkDana/a9481b8134cf38a556cf23e1e815dafb on my M1 Max.
I got results which match the np_default
and not the accelerated ones.
Deprecated since version 1.20: The native libraries on macOS, provided by Accelerate, are not fit for use in NumPy since they have bugs that cause wrong output under easily reproducible conditions. If the vendor fixes those bugs, the library could be reinstated, but until then users compiling for themselves should use another linear algebra library or use the built-in (but slower) default, see the next section.
From https://numpy.org/doc/stable/user/building.html?highlight=blas#accelerated-blas-lapack-libraries. Hence I wonder what is the actual thing being installed.
Have created a NumPy env with Accelerate locally. Here's what I see (using $CONDA_PREFIX
for succinctness):
$ otool -L ~/miniforge/envs/np_accel/lib/python3.11/site-packages/numpy/linalg/_umath_linalg.cpython-311-darwin.so
$CONDA_PREFIX/lib/python3.11/site-packages/numpy/linalg/_umath_linalg.cpython-311-darwin.so:
@rpath/liblapack.3.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libblas.3.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.0.0)
$ otool -L ~/miniforge/envs/np_accel/lib/libblas.3.dylib
$CONDA_PREFIX/lib/libblas.3.dylib:
@rpath/libvecLibFort-ng.dylib (compatibility version 0.0.0, current version 0.0.0)
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib (compatibility version 1.0.0, current version 1.0.0, reexport)
@rpath/liblapack-netlib.3.9.0.dylib (compatibility version 0.0.0, current version 0.0.0, reexport)
@rpath/liblapacke-netlib.3.9.0.dylib (compatibility version 0.0.0, current version 0.0.0, reexport)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.0.0)
That said, the second point raised is NumPy is deprecating support for this option. So one can use Accelerate. However if one runs into issues, likely there won't be any support from NumPy.
That's not the issue. The issue is that Accelerate's lapack version is 3.2.1 which is ancient. Therefore Accelerate can only provide blas which means only numpy can support it. scipy and many other packages cannot.
I would like to point to the Apple developer docs state to support "LAPACK 3.9.1" but "[t]o use the new interfaces, define ACCELERATE_NEW_LAPACK before including the Accelerate or vecLib headers."
Yes, we're aware - NumPy can use Accelerate already, and SciPy is highly likely to re-add support once macOS 13.3 is released.
FWIW macOS 13.3 was released yesterday.
And will running
conda install numpy
(without specifying BLAS) eventually lead to this behaviour by default?Currently, OpenBLAS is the default for this type of switching, unless the maintainers decide otherwise to make Accelerate (vecLib) the default for OSX: https://conda-forge.org/docs/maintainer/knowledge_base.html#blas
Finally, what about related libraries such as scipy and scikit-learn?
I believe SciPy dropped support for Accelerate (vecLib) completely: https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate
Just note that enforcing
"libblas=*=*accelerate"
will act on the entire environment and will likely kill some hardcoded libraries, e.g. PyTorch (link), but we are working on fixing that. For PyTorch that happens because it is hardcoded to OpenBLAS.
@ngam
Hi, will installing pytorch with accelerate be added in the future? Currently, running conda install -c conda-forge pytorch "libblas=*=*accelerate"
still doesn't work. The installed pytorch still links to openblas.
@qdwang not yet, but maybe soon. Let's move the bit about PyTorch to the pytorch-cpu-feedstock (https://github.com/conda-forge/pytorch-cpu-feedstock)
Is there any hope that we can use Accelerate on MacOS 14.0?
NumPy already supports the new Accelerate on macOS >= 13.3, and should have wheels built against Accelerate for macOS >= 14.0. So it's possible - does require updating the conda-forge machinery that does runtime switching of BLAS/LAPACK though I think.
It's a bit tricky; we need an entirely new blas flavour, or some smart switching based on the __osx
version. We only recently gained the ability to even pull in the 13.3 SDK. It's going to be possible, but no promises on the timeline.
This issue has lots of different related issues jumbled. Please open a new issue if you feel like the current blas=*=*accelerate
is not enough.
After update to MacOS 15 it seems I cannot use numpy with accelerate (numpy installed via pip is using accelerate) : I might open this as a separate issue
conda create -n voxel-bayes-3.12 -c conda-forge numpy "libblas=*=*accelerate"
And the checking with numpy
Python 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:07:06) [Clang 17.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.show_config()
/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
"Compilers": {
"c": {
"name": "clang",
"linker": "ld64",
"version": "17.0.6",
"commands": "arm64-apple-darwin20.0.0-clang",
"args": "-ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1725411805471/work=/usr/local/src/conda/numpy-2.1.1, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0, -mmacosx-version-min=11.0",
"linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1725411805471/work=/usr/local/src/conda/numpy-2.1.1, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0, -mmacosx-version-min=11.0"
},
"cython": {
"name": "cython",
"linker": "cython",
"version": "3.0.11",
"commands": "cython"
},
"c++": {
"name": "clang",
"linker": "ld64",
"version": "17.0.6",
"commands": "arm64-apple-darwin20.0.0-clang++",
"args": "-ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1725411805471/work=/usr/local/src/conda/numpy-2.1.1, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0, -mmacosx-version-min=11.0",
"linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1725411805471/work=/usr/local/src/conda/numpy-2.1.1, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0, -mmacosx-version-min=11.0"
}
},
"Machine Information": {
"host": {
"cpu": "arm64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
},
"build": {
"cpu": "aarch64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
},
"cross-compiled": true
},
"Build Dependencies": {
"blas": {
"name": "blas",
"found": true,
"version": "3.9.0",
"detection method": "pkgconfig",
"include directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include",
"lib directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib",
"openblas configuration": "unknown",
"pc file directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib/pkgconfig"
},
"lapack": {
"name": "lapack",
"found": true,
"version": "3.9.0",
"detection method": "pkgconfig",
"include directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include",
"lib directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib",
"openblas configuration": "unknown",
"pc file directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib/pkgconfig"
}
},
"Python Information": {
"path": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/bin/python",
"version": "3.12"
},
"SIMD Extensions": {
"baseline": [
"NEON",
"NEON_FP16",
"NEON_VFPV4",
"ASIMD"
],
"found": [
"ASIMDHP"
],
"not found": [
"ASIMDFHM"
]
}
}
More of a feature request. Is there a plan to enable installation of numpy built against Apple's Accelerate BLAS implementation when on oxs-arm64? e.g. something similar to
conda install -c conda-forge numpy "libblas=*=*accelerate"
(based on the instructions here).This can be achieved by building numpy from source and installing via pip (see these instructions), but it would be great to have a clean conda installation to achieve the same thing.