getspams / spams-python

Python interface for SPAMS (SPArse Modeling Software)
https://thoth.inrialpes.fr/people/mairal/spams/
GNU General Public License v3.0
16 stars 5 forks source link

Using multiple threads/cores on OSX #18

Open daducci opened 2 years ago

daducci commented 2 years ago

Is it possible to use multiple threads/cores under OSX? It works on Linux, but I'm not able to use this feature on the Mac.

gdurif commented 2 years ago

It should be possible and it should be working.

Could try to run the function check_openmp() defined in the setup.py file here.

It returns 0 if linking against OpenMP works and 1 if not.

samuelstjean commented 2 years ago

It is, but you need to compile with not the default mac toolchain since that does not work (I think they have a non standard implementation or something similar). I am personally using another module to check for it automatically here https://github.com/samuelstjean/spams-python/blob/master/setup.py#L114 Seems like it was also added to a special package to do just that if it's cleaner to use https://github.com/astropy/extension-helpers

gdurif commented 2 years ago

Thanks @samuelstjean, I'll look into what you coded, we should reuse it.

extension-helpers seems great, the github repository seems to be still active but there has not been any release on PyPI since december 2019, and that worries me a little.

samuelstjean commented 2 years ago

I am personally using the older version, directly stolen from astropy, so just including this one file could be an idea. If stuff breaks we would have to fix it though, or copy an updated one, so it should not require too much work if it works as is and nobody touches it.

gdurif commented 2 years ago

I asked them about releasing on PyPI more often (c.f. https://github.com/astropy/extension-helpers/issues/38).

Depending on their answer, we will either directly integrate their file as you did or just use extension-helpers as a requirement (which would be less maintenance on our side if possible).

gdurif commented 2 years ago

Should be fixed by #20

@daducci could you try with the corresponding version (c.f. below) and tell me if the mutli-threading works?

pip install git+https://github.com/getspams/spams-python.git@fix_openmp_req_macos
daducci commented 2 years ago

Sorry for the delay. I just tested this version, but it does not work, only 1 core is used. I found that I could fix the issue with OpenMP on OSX if I change the following in the version on this repository:

gdurif commented 2 years ago

@daducci Thanks for the report. I'll do some tests (and eventually report to https://github.com/astropy/extension-helpers if needed).

samuelstjean commented 2 years ago

Seems like someone also found it https://github.com/astropy/extension-helpers/issues/40, my guess is that those flags are for gcc, and apple is using clang as the default compiler. For a very long time, their version had issues or something else, and people had to install a third party gcc to get stuff working (like by using homebrew for example).

Seems like the easy way here is to let extension helper do the job and put the correct flag for the correct compiler, or hack it in somehow, but that sounds like trouble in the future to make it work properly, and on old versions also at the same time

gdurif commented 2 years ago

I am digging a bit on this. As long as extension-helpers does not support this (no answer on the issue), it will be tricky on our part.

setuptools.command.build_ext.new_compiler does not discriminate^[1] between gcc and clang (even less between Apple clang and LLVM clang I would guess). So we will have to do the check manually, and edit the flags accordingly (after calling extension-helpers) for the moment.

[1] See

$ python setup.py build_ext --help-compiler
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c test_openmp.c -o objects/test_openmp.o -fopenmp
x86_64-linux-gnu-gcc -pthread objects/test_openmp.o -o test_openmp -fopenmp
Compiling Cython/C/C++ extension with OpenMP support
List of available compilers:
  --compiler=bcpp     Borland C++ Compiler
  --compiler=cygwin   Cygwin port of GNU C Compiler for Win32
  --compiler=mingw32  Mingw32 port of GNU C Compiler for Win32
  --compiler=msvc     Microsoft Visual C++
  --compiler=unix     standard UNIX-style compiler
gdurif commented 2 years ago

I am trying to fix extension-helpers regarding this matter, c.f. https://github.com/gdurif/extension-helpers/tree/support_apple_clang

I'll keep you updated (I will propose a PR when it is working).

Update: PR on the way https://github.com/astropy/extension-helpers/pull/42

gdurif commented 2 years ago

@daducci could you tell me if installing the modified version of extension-helpers before installing spams solves the problem?

pip install git+https://github.com/gdurif/extension-helpers.git@support_apple_clang
pip install git+https://github.com/getspams/spams-python.git@fix_openmp_req_macos
daducci commented 2 years ago

No, in my case only 1 core is used. :-( Multiple cores are used only when installing the version with the manual patch I suggested few days ago. Dunno why this one does not work, as it seems all the modifications in there match mine.

gdurif commented 2 years ago

@daducci ok thanks, I'll continue digging. Since the manual patch works, I hope we are not foo far from finding the solution!

gdurif commented 2 years ago

@daducci I think the issue was found (https://github.com/astropy/extension-helpers/pull/42#issuecomment-1090845149). You are using the new Apple CPUs if I recall correctly ? (hence not Intel-based, hence a different location for libomp)

daducci commented 2 years ago

Yes, I am! Sorry, I didn't think Apple would change paths/etc... based on the vendor of its CPUs!

gdurif commented 2 years ago

No problem, we will find a solution, the person suggested one in his response.

gdurif commented 2 years ago

@daducci Could you tell me the output of brew --prefix libomp on your machine please?

Edit: and also python -c "import sys; print(sys.platform)" ? (just to be sure) Thanks

daducci commented 2 years ago

brew --prefix libomp --> /usr/local/opt/libomp

python -c "import sys; print(sys.platform)" --> darwin

gdurif commented 2 years ago

@daducci could you retry ? (maybe uninstall every thing before or try in a clean environment)

pip install git+https://github.com/gdurif/extension-helpers.git@support_apple_clang
pip install git+https://github.com/getspams/spams-python.git@fix_openmp_req_macos

On my MacOS VM, multi-threading seems to work.

daducci commented 2 years ago

Hi @gdurif ,

sorry for the long delay, I have been off for Easter. Unfortunately, it doesn't work; I can make it work only when using the "manual trick" from a previous message. Here is the code snippet I use to benchmark:

import time
import spams
import numpy as np
from tqdm import tqdm

# data generation
np.random.seed(0)
X = np.asfortranarray(np.random.normal(size=(100,200)))
X = np.asfortranarray(X / np.tile(np.sqrt((X*X).sum(axis=0)),(X.shape[0],1)))
D = np.asfortranarray(np.random.normal(size=(100,1000)))
D = np.asfortranarray(D / np.tile(np.sqrt((D*D).sum(axis=0)),(D.shape[0],1)))

tic = time.time()
for i in tqdm(range(500)):
    alpha = spams.lasso( X, D=D, lambda1=0.15, numThreads=-1 )
print( f"{time.time() - tic:.2f}s")

If needed, we can chat one of these days and make some test in real time. Just let me know if this can help!

gdurif commented 2 years ago

Hi @daducci No problem ;-) Thanks for the code snippet. I will use it to improve my tests. And also we could definitely chat about it, I'll send you an email. Best

daducci commented 2 years ago

Hi guys, what about resuming this discussion to try getting to the bottom of it?

samuelstjean commented 2 years ago

Looks like it will be complicated for now, since building on arm is done by cross-compiling with the thing I am using, and that means homebrew pulls stuff for x64 instead. It does build without putting in the libs, so people might be able to work around it by installing openblas themselves. If you want to try it (or even the old x64 version), that should be it https://we.tl/t-zeELwQpqmC And of course it could not be tested for the same reasons, so it might not even start.

samuelstjean commented 2 years ago

Well if we wait a bit normally there will be some buildbots directly on arm64 for macs. As of now, this means it will be impossible to install a premade blas, unless we want to compile our own each time, but I would not do that because it means we would need to check the test and fix ourselves every little bugs since we build from source. Hopefully this will fix it https://github.com/pypa/cibuildwheel/issues/1204

As of now, I am using the conda version on windows, the cent os version on linux and the homebrew version for mac to avoid that, but that means we won't have a blas version for mac M1 until someone else makes it for me. With a bit of luck maybe the M1 version can use the libs from the x64 version as it apparently emulates stuff behind the scene for you, so if those builds work we could do that (and normally it should be multithreaded, but I don't have a mac so...).

samuelstjean commented 1 year ago

Well I tried some new arm mac build, but I don't have access to one to check if everything works as supposed. They are built with some emulation layer, so I can not test them either on the build machines. It should be fine if you want to give it a try before putting them back here https://github.com/samuelstjean/spams-python/releases