OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.39k stars 1.5k forks source link

Cannot build OpenBLAS 0.3.24 on A64FX #4257

Closed antoine-morvan closed 1 year ago

antoine-morvan commented 1 year ago

Hello,

I just tried building OpenBLAS 0.3.24 on A64FX (Fujitsu FX 700), but it fails saying some files are missing.

When using the make build process, it fails with :

make[1]: Entering directory '/home_nfs/bmorvana/test/openblastest/OpenBLAS-0.3.24/kernel'
make[1]: *** No rule to make target '../kernel/arm64/sgemm_ncopy_sve_v1.c', needed by 'sgemm_incopy.o'.  Stop.
make[1]: Leaving directory '/home_nfs/bmorvana/test/openblastest/OpenBLAS-0.3.24/kernel'

When using the still experimental CMake build process, I get :

[ 78%] Building C object kernel/CMakeFiles/kernel.dir/CMakeFiles/sgemm_incopy.c.o
cd /home_nfs/bmorvana/test/openblastest/build/kernel && /home_nfs/bmorvana/software/Linux/aarch64/default/gcc-13.2.0/bin/gcc  -I/home_nfs/bmorvana/test/openblastest/OpenBLAS-0.3.24 -I/home_nfs/bmorvana/test/openblastest/build -O3 -ffast-math -mcpu=native  -DHAVE_C11 -fopenmp -DUSE_OPENMP -Wall -march=armv8.2-a+sve -mtune=a64fx -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=48 -DMAX_PARALLEL_NUMBER=1 -DMAX_STACK_ALLOC=2048 -DNO_AFFINITY -DVERSION="\"0.3.24\"" -DBUILD_SINGLE -DBUILD_DOUBLE -DBUILD_COMPLEX -DBUILD_COMPLEX16 -O3 -DNDEBUG -MD -MT kernel/CMakeFiles/kernel.dir/CMakeFiles/sgemm_incopy.c.o -MF CMakeFiles/kernel.dir/CMakeFiles/sgemm_incopy.c.o.d -o CMakeFiles/kernel.dir/CMakeFiles/sgemm_incopy.c.o -c /home_nfs/bmorvana/test/openblastest/build/kernel/CMakeFiles/sgemm_incopy.c
/home_nfs/bmorvana/test/openblastest/build/kernel/CMakeFiles/sgemm_incopy.c:8:10: fatal error: /home_nfs/bmorvana/test/openblastest/OpenBLAS-0.3.24/kernel/arm64/sgemm_ncopy_sve_v1.c: No such file or directory
    8 | #include "/home_nfs/bmorvana/test/openblastest/OpenBLAS-0.3.24/kernel/arm64/sgemm_ncopy_sve_v1.c"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Please note that:

My systems runs

Here is a sample script reproducing the build steps (compiler env setup to tune) : https://gist.github.com/antoine-morvan/bf6168f51f7d3b0058f69a664a9d7217

Best.

martin-frbg commented 1 year ago

Oops, this got rearranged by #4009 without updating the definitions in KERNEL.A64FX (which explains why it compiles on Graviton3). I'll do a PR in a minute, but the trivial fix is to copy kernel/arm64/KERNEL.ARMV8SVE to kernel/arm64/KERNEL.A64FX (and I think @mousius ' intention was for all the cpu-specific SVE KERNEL definition files to go away and just use KERNEL.ARMV8SVE as the individual BLAS kernels for SVE are the same - at least currently)

antoine-morvan commented 1 year ago

Cannot try as of today, but I guess you could already prepare for NVidia Grace (Neoverse V2 Cores: Armv9 with 4x128b SVE2 : https://resources.nvidia.com/en-us-grace-cpu/nvidia-grace-cpu-superchip)

martin-frbg commented 1 year ago

Certainly - in fact any SVE2 would be great, even just based on CortexX1 for testing, but it's all a question of time and talent...