Closed alex-rakowski closed 11 months ago
I dug into this a little bit, and M-series macs only support 64bit precision. So replacing np.float128
and np.complex256
with np.longdouble
and np. long double
is equivalent to using np.float64
, and np.complex128
. So, while it allows the test to run, it does change the calculations. However, it doesn't alter the performance on x86 systems.
Hi @alex-rakowski, thanks for raising this issue.
I also have Mac M-series (Apple M1 Pro) but I never experienced this. Could you please paste the same screenshot as below for your Mac
In general, I think we could remove np.float128
and np.complex256
from test_fft.py
, in most other tests we do not have them. I think it was @cako that inserted those, do you remember why?
The only thing I am not sure is why you also face issues with Radon2D
and Radon3D
as they do not use any fft. Can you explain?
I'm not sure how to bring up that screenshot, but the details are:
This issue on the numpy repo, seems to explain the behavior differences we have. There's some discrepancy in whether they have float128
and complex256
available, but they all seem to cap out at the same precision.
I checked this on my M1 mac and my friends M2-pro (he didn't have float128
or complex256
either). I've tried a couple of different Numpy install methods (pip, conda, building locally) and a couple of versions, and I can't get float128
or complex256
.
np.float64(1.5).itemsize = 8
np.double(1.5).itemsize = 8
np.longdouble(1.5).itemsize = 8
==============complex==============
np.complex128(1.5).itemsize = 16
np.cdouble(1.5).itemsize = 16
np.clongdouble(1.5).itemsize = 16
On x86
np.float64(1.5).itemsize = 8
np.double(1.5).itemsize = 8
np.float128(1.5).itemsize = 16
np.longdouble(1.5).itemsize = 16
==============complex==============
np.complex128(1.5).itemsize = 16
np.cdouble(1.5).itemsize = 16
np.complex256(1.5).itemsize = 32
np.clongdouble(1.5).itemsize = 32
I think it would be safe to switch to np.longdouble
and np.clongdouble
as the worst case is its redundant check of 64bit, but on x86, probably power and maybe other architectures, it would check genuine precession.
python code to run above
print(f"{np.float64(1.5).itemsize = }")
print(f"{np.double(1.5).itemsize = }")
print(f"{np.float128(1.5).itemsize = }")
print(f"{np.longdouble(1.5).itemsize = }")
print(f"{'complex':=^35}")
print(f"{np.complex128(1.5).itemsize = }")
print(f"{np.cdouble(1.5).itemsize = }")
print(f"{np.complex256(1.5).itemsize = }")
print(f"{np.clongdouble(1.5).itemsize = }")
my Numpy config from pylops dev install environment
np.show_config()
openblas64__info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['/usr/local/lib']
blas_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['/usr/local/lib']
openblas64__lapack_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['/usr/local/lib']
lapack_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['/usr/local/lib']
Supported SIMD extensions in this NumPy install:
baseline = NEON,NEON_FP16,NEON_VFPV4,ASIMD
found = ASIMDHP,ASIMDDP
not found = ASIMDFHM
I'm not familiar enough with the code to dig into too much currently, but looking through a bit seems like nans are generated, which causes issues.
Fails:
assert dottest(Rop, par["nhx"] * par["nt"], par["npx"] * par["nt"], rtol=1e-3)
AssertionError: Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan
# and
assert_array_almost_equal(x.ravel(), xinv, decimal=1)
x and y nan location mismatch:
x: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,...
y: array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,...
Fails:
assert_array_almost_equal(x.ravel(), xinv, decimal=1)
Arrays are not almost equal to 1 decimals
Mismatched elements: 1 / 231 (0.433%)
Max absolute difference: 0.54404683
Max relative difference: 1.19320768
x: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,...
y: array([-0.0e+00, -0.0e+00, -0.0e+00, -0.0e+00, -1.8e-03, 3.2e-02,
5.7e-03, 2.2e-02, -3.2e-03, 1.6e-02, -5.6e-04, -0.0e+00,
-0.0e+00, -0.0e+00, -6.7e-03, -1.3e-02, 2.2e-02, 6.8e-02,...
Fails:
assert dottest(Rop, par["nhx"] * par["nt"], par["npx"] * par["nt"], rtol=1e-3)
AssertionError: Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan
# and
assert_array_almost_equal(x.ravel(), xinv, decimal=1)
AssertionError:
Arrays are not almost equal to 1 decimals
x and y nan location mismatch:
x: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,...
y: array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,...
Fails:
assert dottest(
Rop,
par["nhy"] * par["nhx"] * par["nt"],
par["npy"] * par["npx"] * par["nt"],
rtol=1e-3,
)
Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan
# and
xinv, _, _ = fista(Rop, y, niter=200, eps=3e0)
assert_array_almost_equal(x.ravel(), xinv, decimal=1)
Arrays are not almost equal to 1 decimals
x and y nan location mismatch:
x: array([0., 0., 0., ..., 0., 0., 0.])
y: array([nan, nan, nan, ..., nan, nan, nan])
Fails:
xinv, _, _ = fista(Rop, y, niter=200, eps=3e0)
assert_array_almost_equal(x.ravel(), xinv, decimal=1)
Arrays are not almost equal to 1 decimals
Mismatched elements: 1 / 3927 (0.0255%)
Max absolute difference: 0.62957715
Max relative difference: 1.69961745
x: array([0., 0., 0., ..., 0., 0., 0.])
y: array([-0., -0., -0., ..., -0., -0., -0.])
Fails:
assert dottest(
Rop,
par["nhy"] * par["nhx"] * par["nt"],
par["npy"] * par["npx"] * par["nt"],
rtol=1e-3,
)
Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan
Fails:
xinv, _, _ = fista(Rop, y, niter=200, eps=3e0)
assert_array_almost_equal(x.ravel(), xinv, decimal=1)
Arrays are not almost equal to 1 decimals
Mismatched elements: 1 / 3927 (0.0255%)
Max absolute difference: 0.28258355
Max relative difference: 1.
x: array([0., 0., 0., ..., 0., 0., 0.])
y: array([ 0., 0., 0., ..., -0., -0., 0.])
Alright, I tried to install a brand new environment myself with make dev-install_conda
to get as close as possible to your versions of python libraries. Now my numpy matches yours (and I have a slightly newer ipython). And all my tests are passing.
For the Radon2D
and Radon3D
failing tests, I am also not so sure. What I know for sure is that these are numba codes, and numba sometimes falls behind compared to numpy. In most cases in my experience you either get errors at library installation or import, but I never saw this behavior (especially referring to the nan
). Can you tell me what version of numba
you have? At least I can try to see if I can reproduce those failed tests and look at why, so far it is impossible for me as I can't locally reproduce your behavior... and MacOS CIs do not seem to have your issue either
Now that I understand a bit more from your digging (thanks!), I think I agree with your suggestion of moving to np.longdouble
and np.clongdouble` in the fft tests. At least for now we get both tests on higher precision formats on x86, and Mac to fall back to the closest lower precision available. Would you mind making a small separate PR to fix this?
Alright, I tried to install a brand new environment myself with
make dev-install_conda
to get as close as possible to your versions of python libraries. Now my numpy matches yours (and I have a slightly newer ipython). And all my tests are passing.
I tried using make dev-install_conda
but it gives an error about icc_rt
which I guess makes sense. I commented out icc_rt
in envrionment-dev.yaml
, which allowed make dev-install_conda
to run. But I'm getting the same radon test errors.
For the
Radon2D
andRadon3D
failing tests, I am also not so sure. What I know for sure is that these are numba codes, and numba sometimes falls behind compared to numpy. In most cases in my experience you either get errors at library installation or import, but I never saw this behavior (especially referring to thenan
). Can you tell me what version ofnumba
you have? At least I can try to see if I can reproduce those failed tests and look at why, so far it is impossible for me as I can't locally reproduce your behavior... and MacOS CIs do not seem to have your issue either
I can check this on another M1-Mac later; maybe there's something misconfigured on my MacBook.
Numba version and the response from numba -s below, and it looks like it's all but the latest version of numba.
numba 0.58.0 py311h7aedaa7_0
libllvm14 14.0.6 h7ec7a93_3
llvm-openmp 14.0.6 hc6e5704_0
llvmlite 0.41.0 py311h514c7bf_0
numba -s
System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time) : 2023-11-25 22:47:37.407110
UTC start time : 2023-11-26 06:47:37.407119
Running time (s) : 2.568088
__Hardware Information__
Machine : arm64
CPU Name : cyclone
CPU Count : 8
Number of accessible CPUs : ?
List of accessible CPUs cores : ?
CFS Restrictions (CPUs worth of runtime) : None
CPU Features :
Memory Total (MB) : 16384
Memory Available (MB) : 4655
__OS Information__
Platform Name : macOS-14.1.1-arm64-arm-64bit
Platform Release : 23.1.0
OS Name : Darwin
OS Version : Darwin Kernel Version 23.1.0: Mon Oct 9 21:28:12 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T8103
OS Specific Version : 14.1.1 arm64
Libc Version : ?
__Python Information__
Python Compiler : Clang 15.0.7
Python Implementation : CPython
Python Version : 3.11.6
Python Locale : en_US.UTF-8
__Numba Toolchain Versions__
Numba Version : 0.58.0
llvmlite Version : 0.41.0
__LLVM Information__
LLVM Version : 14.0.6
__CUDA Information__
CUDA Device Initialized : False
< deleted info >
__NumPy Information__
NumPy Version : 1.25.2
NumPy Supported SIMD features : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD', 'FPHP', 'ASIMDHP', 'ASIMDDP')
NumPy Supported SIMD dispatch : ('ASIMDHP', 'ASIMDDP', 'ASIMDFHM')
NumPy Supported SIMD baseline : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD')
NumPy AVX512_SKX support detected : False
__SVML Information__
SVML State, config.USING_SVML : False
SVML Library Loaded : False
llvmlite Using SVML Patched LLVM : False
SVML Operational : False
__Threading Layer Information__
TBB Threading Layer Available : True
+-->TBB imported successfully.
OpenMP Threading Layer Available : True
+-->Vendor: Intel
Workqueue Threading Layer Available : True
+-->Workqueue imported successfully.
__Numba Environment Variable Information__
None found.
__Conda Information__
Conda Build : not installed
Conda Env : 23.10.0
Conda Platform : osx-arm64
Conda Python Version : 3.8.16.final.0
Conda Root Writable : True
Now that I understand a bit more from your digging (thanks!), I think I agree with your suggestion of moving to
np.longdouble
and np.clongdouble` in the fft tests. At least for now we get both tests on higher precision formats on x86, and Mac to fall back to the closest lower precision available. Would you mind making a small separate PR to fix this?
PR'd #552
tldr:
pylops tests fail to run on m1 mac due to
np.float128
andnp.complex256
not existing. Replacing withnp.longdouble
andnp.clongdouble
intest_fft.py
allows tests to run. This results in 7 failed tests on m1 mac, and all passing on linux workstation.tests_fft.py
be changed to this?setup
Using a clone of dev
conda install pyfftw -c conda-forge && make dev-install
,pyfftw
cannot be pip installed currently on arm macs (or at least I haven't had luck doing so).M1 mac issues:
make tests
fails to run on M1 macs, due tonp.float128
andnp.complex256
not existing. This is a truncated output:Replacing
np.float128
andnp.complex256
withnp.longdouble
andnp.clongdouble
Replacing with all
np.float128
andnp.complex256
withnp.longdouble
andnp.clongdouble
respectively intest_ffts.py
will now yield7 failed, 2148 passed, 3608 warnings
:Radon2D
andRadon3D
are both taking"float64"
as the dtype in the args.Radon2D fails on:
Radon3D fails on:
Runing on linux workstation with
np.longdouble
andnp.clongdouble
Running these tests on Linux workstation specs below passes with either
np.longdouble
andnp.clongdouble
ornp.float128
andnp.complex256
, both yield2155 passed, 3965 warnings
.Specs