FEniCS / basix

FEniCSx finite element basis evaluation library
https://fenicsproject.org
MIT License
90 stars 34 forks source link

Tabulation of simple elements fail on an aarch64 machine #852

Open IgorBaratta opened 1 month ago

IgorBaratta commented 1 month ago

Reproducer:

import numpy as np

import basix
from basix import CellType, ElementFamily, LagrangeVariant

def round(x : np.ndarray):
    # round less than 1e-10 to 0
    x[np.abs(x) < 1e-10] = 0
    return x

for i in range(1, 5):
    lagrange = basix.create_element(
        ElementFamily.P, CellType.interval, i, LagrangeVariant.equispaced
    )
    points = lagrange.points
    print(f"Degree {i}:  \n{points}")

    phi = lagrange.tabulate(0, points)
    phi = round(phi)

    print(f"Phi:  \n{phi.reshape((points.size, -1))}")

It should be the identity.

Using openblas@0.3.27:

Degree 1:  
[[0.]
 [1.]]
Phi:  
[[1. 0.]
 [0. 1.]]

Degree 2:  
[[0. ]
 [1. ]
 [0.5]]
Phi:  
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

Degree 3:  
[[0.        ]
 [1.        ]
 [0.33333333]
 [0.66666667]]
Phi:  
[[ 1.          0.          0.          0.        ]
 [ 0.          1.          0.          0.        ]
 [ 0.          0.          1.          0.        ]
 [-0.03804685 -0.03804685 -0.11414056  0.88585944]]

Degree 4:  
[[0.  ]
 [1.  ]
 [0.25]
 [0.5 ]
 [0.75]]
Phi:  
[[ 1.          0.          0.          0.          0.        ]
 [ 0.          1.          0.          0.          0.        ]
 [ 0.          0.          1.          0.          0.        ]
 [-0.01150543  0.01150543 -0.02629812  1.          0.02629812]
 [ 0.02229177 -0.02229177  0.05095261  0.          0.94904739]]

Tested with gcc@11.4.0, gcc@12.4.0, gcc@14.2.0 Since the matrices are so small, 1D elements I wouldn't expect it to use blas, but this could be the issue. I haven't investigated it further.

jhale commented 1 month ago

Could you run the test suite locally? The reason I ask is that I did some wheel builds on aarch64 recently against OpenBLAS, and all the tests ran green.

IgorBaratta commented 1 month ago

Could you run the test suite locally? The reason I ask is that I did some wheel builds on aarch64 recently against OpenBLAS, and all the tests ran green.

It actually works on my mac, but failed on two different neoverse-v2 machines I tested using standard spack isntallation. Just tested locally. All test pass if I build with netlib-lapack@3.11.0 and blis@1.0. But most tests fail with openblas@0.3.27.

jhale commented 1 month ago

At least on my Mac it picks up Apple's native BLAS though.

So, I think we can say this is a BLAS issue, either in OpenBLAS itself, or in the way we interface with it.

jhale commented 1 month ago

I took a quick look at this - neoverse-v2 is a very new microarchitecture, so I would guess OpenBLAS has a bug. Have you considered injecting the ARM Performance Library into Spack instead?

https://community.arm.com/arm-community-blogs/b/high-performance-computing-blog/posts/arm-compiler-for-linux-and-arm-pl-now-available-in-spack

IgorBaratta commented 1 month ago

I'll give it a try.