PyLops / pylops

PyLops – A Linear-Operator Library for Python
https://pylops.readthedocs.io
GNU Lesser General Public License v3.0
426 stars 102 forks source link

make tests fails on ARM Mac #550

Closed alex-rakowski closed 11 months ago

alex-rakowski commented 11 months ago

tldr:

pylops tests fail to run on m1 mac due to np.float128 and np.complex256 not existing. Replacing with np.longdouble and np.clongdouble in test_fft.py allows tests to run. This results in 7 failed tests on m1 mac, and all passing on linux workstation.

setup

Using a clone of dev conda install pyfftw -c conda-forge && make dev-install, pyfftw cannot be pip installed currently on arm macs (or at least I haven't had luck doing so).

M1 mac issues:

make tests fails to run on M1 macs, due to np.float128 and np.complex256 not existing. This is a truncated output:

pylops/utils/wavelets.py:21
  /Users/arakowski/Documents/git_repos/pylops/pylops/utils/wavelets.py:21: UserWarning: one sample removed from time axis...
    warnings.warn("one sample removed from time axis...")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================== short test summary info ======================================
ERROR pytests/test_ffts.py - AttributeError: module 'numpy' has no attribute 'float128'. Did you mean: 'float16'?

Replacing np.float128 and np.complex256 with np.longdouble and np.clongdouble

Replacing with all np.float128 and np.complex256 with np.longdouble and np.clongdouble respectively in test_ffts.py will now yield 7 failed, 2148 passed, 3608 warnings:

FAILED pytests/test_radon.py::test_Radon2D[par2] - AssertionError: Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan
FAILED pytests/test_radon.py::test_Radon2D[par3] - AssertionError:
FAILED pytests/test_radon.py::test_Radon2D[par5] - AssertionError: Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan
FAILED pytests/test_radon.py::test_Radon3D[par2] - AssertionError: Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan
FAILED pytests/test_radon.py::test_Radon3D[par3] - AssertionError:
FAILED pytests/test_radon.py::test_Radon3D[par5] - AssertionError: Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan
FAILED pytests/test_radon.py::test_Radon3D[par7] - AssertionError:

Radon2D and Radon3D are both taking "float64" as the dtype in the args.

Radon2D fails on:

Radon3D fails on:


Runing on linux workstation with np.longdouble and np.clongdouble

Running these tests on Linux workstation specs below passes with either np.longdouble and np.clongdouble or np.float128 and np.complex256, both yield 2155 passed, 3965 warnings.


Specs

OS: Ubuntu 20.04.6 LTS
Kernel: 5.4.0-146-generic
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          48
Model name:                      AMD Ryzen Threadripper 3960X 24-Core Processor
alex-rakowski commented 11 months ago

I dug into this a little bit, and M-series macs only support 64bit precision. So replacing np.float128 and np.complex256 with np.longdouble and np. long double is equivalent to using np.float64, and np.complex128. So, while it allows the test to run, it does change the calculations. However, it doesn't alter the performance on x86 systems.

mrava87 commented 11 months ago

Hi @alex-rakowski, thanks for raising this issue.

I also have Mac M-series (Apple M1 Pro) but I never experienced this. Could you please paste the same screenshot as below for your Mac

Screen Shot 2023-11-25 at 9 20 04 PM

In general, I think we could remove np.float128 and np.complex256 from test_fft.py, in most other tests we do not have them. I think it was @cako that inserted those, do you remember why?

The only thing I am not sure is why you also face issues with Radon2D and Radon3D as they do not use any fft. Can you explain?

alex-rakowski commented 11 months ago

Dtypes

I'm not sure how to bring up that screenshot, but the details are:

This issue on the numpy repo, seems to explain the behavior differences we have. There's some discrepancy in whether they have float128 and complex256 available, but they all seem to cap out at the same precision.

I checked this on my M1 mac and my friends M2-pro (he didn't have float128 or complex256 either). I've tried a couple of different Numpy install methods (pip, conda, building locally) and a couple of versions, and I can't get float128 or complex256.

np.float64(1.5).itemsize = 8
np.double(1.5).itemsize = 8
np.longdouble(1.5).itemsize = 8
==============complex==============
np.complex128(1.5).itemsize = 16
np.cdouble(1.5).itemsize = 16
np.clongdouble(1.5).itemsize = 16

On x86

np.float64(1.5).itemsize = 8
np.double(1.5).itemsize = 8
np.float128(1.5).itemsize = 16
np.longdouble(1.5).itemsize = 16
==============complex==============
np.complex128(1.5).itemsize = 16
np.cdouble(1.5).itemsize = 16
np.complex256(1.5).itemsize = 32
np.clongdouble(1.5).itemsize = 32

I think it would be safe to switch to np.longdouble and np.clongdouble as the worst case is its redundant check of 64bit, but on x86, probably power and maybe other architectures, it would check genuine precession.

python code to run above

print(f"{np.float64(1.5).itemsize = }")
print(f"{np.double(1.5).itemsize = }")
print(f"{np.float128(1.5).itemsize = }")
print(f"{np.longdouble(1.5).itemsize = }")
print(f"{'complex':=^35}")
print(f"{np.complex128(1.5).itemsize = }")
print(f"{np.cdouble(1.5).itemsize = }")
print(f"{np.complex256(1.5).itemsize = }")
print(f"{np.clongdouble(1.5).itemsize = }")

my Numpy config from pylops dev install environment

np.show_config()
openblas64__info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
    runtime_library_dirs = ['/usr/local/lib']
blas_ilp64_opt_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
    runtime_library_dirs = ['/usr/local/lib']
openblas64__lapack_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
    runtime_library_dirs = ['/usr/local/lib']
lapack_ilp64_opt_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
    runtime_library_dirs = ['/usr/local/lib']
Supported SIMD extensions in this NumPy install:
    baseline = NEON,NEON_FP16,NEON_VFPV4,ASIMD
    found = ASIMDHP,ASIMDDP
    not found = ASIMDFHM

Test Errors

I'm not familiar enough with the code to dig into too much currently, but looking through a bit seems like nans are generated, which causes issues.

Radon2D - test 2 par 3:

Fails:

assert dottest(Rop, par["nhx"] * par["nt"], par["npx"] * par["nt"], rtol=1e-3)
AssertionError: Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan
# and
assert_array_almost_equal(x.ravel(), xinv, decimal=1)
x and y nan location mismatch:
 x: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,...
 y: array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,...

Radon2D - test 3 par 4:

Fails:

assert_array_almost_equal(x.ravel(), xinv, decimal=1)
Arrays are not almost equal to 1 decimals

Mismatched elements: 1 / 231 (0.433%)
Max absolute difference: 0.54404683
Max relative difference: 1.19320768
 x: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,...
 y: array([-0.0e+00, -0.0e+00, -0.0e+00, -0.0e+00, -1.8e-03,  3.2e-02,
        5.7e-03,  2.2e-02, -3.2e-03,  1.6e-02, -5.6e-04, -0.0e+00,
       -0.0e+00, -0.0e+00, -6.7e-03, -1.3e-02,  2.2e-02,  6.8e-02,...

Radon2D - test 5 par 6:

Fails:

assert dottest(Rop, par["nhx"] * par["nt"], par["npx"] * par["nt"], rtol=1e-3)
AssertionError: Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan
# and
assert_array_almost_equal(x.ravel(), xinv, decimal=1)
AssertionError: 
Arrays are not almost equal to 1 decimals

x and y nan location mismatch:
 x: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,...
 y: array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,...

Radon3D - test 2 par 3:

Fails:

        assert dottest(
            Rop,
            par["nhy"] * par["nhx"] * par["nt"],
            par["npy"] * par["npx"] * par["nt"],
            rtol=1e-3,
        )
Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan

# and
xinv, _, _ = fista(Rop, y, niter=200, eps=3e0)
            assert_array_almost_equal(x.ravel(), xinv, decimal=1)
Arrays are not almost equal to 1 decimals

x and y nan location mismatch:
 x: array([0., 0., 0., ..., 0., 0., 0.])
 y: array([nan, nan, nan, ..., nan, nan, nan])

Radon3D - test3 par 4

Fails:

xinv, _, _ = fista(Rop, y, niter=200, eps=3e0)
            assert_array_almost_equal(x.ravel(), xinv, decimal=1)
Arrays are not almost equal to 1 decimals

Mismatched elements: 1 / 3927 (0.0255%)
Max absolute difference: 0.62957715
Max relative difference: 1.69961745
 x: array([0., 0., 0., ..., 0., 0., 0.])
 y: array([-0., -0., -0., ..., -0., -0., -0.])

Radon3D test 5 par 6:

Fails:

        assert dottest(
            Rop,
            par["nhy"] * par["nhx"] * par["nt"],
            par["npy"] * par["npx"] * par["nt"],
            rtol=1e-3,
        )
Dot test failed, v^H(Opu)=nan - u^H(Op^Hv)=nan

Radon3D test 7 par 8:

Fails:

xinv, _, _ = fista(Rop, y, niter=200, eps=3e0)
            assert_array_almost_equal(x.ravel(), xinv, decimal=1)
Arrays are not almost equal to 1 decimals

Mismatched elements: 1 / 3927 (0.0255%)
Max absolute difference: 0.28258355
Max relative difference: 1.
 x: array([0., 0., 0., ..., 0., 0., 0.])
 y: array([ 0.,  0.,  0., ..., -0., -0.,  0.])
mrava87 commented 11 months ago

Alright, I tried to install a brand new environment myself with make dev-install_conda to get as close as possible to your versions of python libraries. Now my numpy matches yours (and I have a slightly newer ipython). And all my tests are passing.

For the Radon2D and Radon3D failing tests, I am also not so sure. What I know for sure is that these are numba codes, and numba sometimes falls behind compared to numpy. In most cases in my experience you either get errors at library installation or import, but I never saw this behavior (especially referring to the nan). Can you tell me what version of numba you have? At least I can try to see if I can reproduce those failed tests and look at why, so far it is impossible for me as I can't locally reproduce your behavior... and MacOS CIs do not seem to have your issue either

Now that I understand a bit more from your digging (thanks!), I think I agree with your suggestion of moving to np.longdouble and np.clongdouble` in the fft tests. At least for now we get both tests on higher precision formats on x86, and Mac to fall back to the closest lower precision available. Would you mind making a small separate PR to fix this?

alex-rakowski commented 11 months ago

Alright, I tried to install a brand new environment myself with make dev-install_conda to get as close as possible to your versions of python libraries. Now my numpy matches yours (and I have a slightly newer ipython). And all my tests are passing.

I tried using make dev-install_conda but it gives an error about icc_rt which I guess makes sense. I commented out icc_rt in envrionment-dev.yaml, which allowed make dev-install_conda to run. But I'm getting the same radon test errors.

For the Radon2D and Radon3D failing tests, I am also not so sure. What I know for sure is that these are numba codes, and numba sometimes falls behind compared to numpy. In most cases in my experience you either get errors at library installation or import, but I never saw this behavior (especially referring to the nan). Can you tell me what version of numba you have? At least I can try to see if I can reproduce those failed tests and look at why, so far it is impossible for me as I can't locally reproduce your behavior... and MacOS CIs do not seem to have your issue either

I can check this on another M1-Mac later; maybe there's something misconfigured on my MacBook.

Numba version and the response from numba -s below, and it looks like it's all but the latest version of numba.

numba                     0.58.0          py311h7aedaa7_0
libllvm14                 14.0.6               h7ec7a93_3
llvm-openmp               14.0.6               hc6e5704_0
llvmlite                  0.41.0          py311h514c7bf_0
numba -s
System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : 2023-11-25 22:47:37.407110
UTC start time                                : 2023-11-26 06:47:37.407119
Running time (s)                              : 2.568088

__Hardware Information__
Machine                                       : arm64
CPU Name                                      : cyclone
CPU Count                                     : 8
Number of accessible CPUs                     : ?
List of accessible CPUs cores                 : ?
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  :

Memory Total (MB)                             : 16384
Memory Available (MB)                         : 4655

__OS Information__
Platform Name                                 : macOS-14.1.1-arm64-arm-64bit
Platform Release                              : 23.1.0
OS Name                                       : Darwin
OS Version                                    : Darwin Kernel Version 23.1.0: Mon Oct  9 21:28:12 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T8103
OS Specific Version                           : 14.1.1   arm64
Libc Version                                  : ?

__Python Information__
Python Compiler                               : Clang 15.0.7
Python Implementation                         : CPython
Python Version                                : 3.11.6
Python Locale                                 : en_US.UTF-8

__Numba Toolchain Versions__
Numba Version                                 : 0.58.0
llvmlite Version                              : 0.41.0

__LLVM Information__
LLVM Version                                  : 14.0.6

__CUDA Information__
CUDA Device Initialized                       : False
< deleted info > 

__NumPy Information__
NumPy Version                                 : 1.25.2
NumPy Supported SIMD features                 : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD', 'FPHP', 'ASIMDHP', 'ASIMDDP')
NumPy Supported SIMD dispatch                 : ('ASIMDHP', 'ASIMDDP', 'ASIMDFHM')
NumPy Supported SIMD baseline                 : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD')
NumPy AVX512_SKX support detected             : False

__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : False
llvmlite Using SVML Patched LLVM              : False
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : True
+-->TBB imported successfully.
OpenMP Threading Layer Available              : True
+-->Vendor: Intel
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda Build                                   : not installed
Conda Env                                     : 23.10.0
Conda Platform                                : osx-arm64
Conda Python Version                          : 3.8.16.final.0
Conda Root Writable                           : True

Now that I understand a bit more from your digging (thanks!), I think I agree with your suggestion of moving to np.longdouble and np.clongdouble` in the fft tests. At least for now we get both tests on higher precision formats on x86, and Mac to fall back to the closest lower precision available. Would you mind making a small separate PR to fix this?

PR'd #552