OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.39k stars 1.5k forks source link

Patch to enable building SVE kernels (NEOVERSEN2, NEOVERSEV1) on arm64 using NVIDIA HPC compilers #4201

Closed cparrott73 closed 1 year ago

cparrott73 commented 1 year ago

The NVIDIA HPC compilers are also capable of compiling SVE code on arm64 if the -Msve_intrinsics flag is passed to the compiler. I have created a patch to OpenBLAS that modifies c_check to check if the compiler supports -Msve_intrinsics before setting $no_sve = 1. The patch also modifies kernel/Makefile to check if we are using nvc on arm64, and then pass -Msve_instrincs when building these kernels. This enables the NEOVERSEN2 and NEOVERSEV1 kernels to build with nvc on arm64, leading to improved performance on these CPUs when building OpenBLAS with the NVIDIA HPC compilers.

I tested this change on an AWS Graviton3 instance, and all the internal BLAS tests passed with these kernels enabled.

Note that there is currently a known bug in the NVIDIA HPC compilers involving the CDOT/ZDOT ThunderX2-tuned kernels getting incorrect answers on arm64 when compiled at -O2 or higher, specifically a miscompile of the zdot_thunderx2t99.c file. I have triaged this bug and sent a report to our compiler engineers with my findings. Our developer is testing a fix, and my hope is that the fix will be included in the 23.9 release.

oblas_nvc_sve.diff.txt

brada4 commented 1 year ago

Since version 0.2?

martin-frbg commented 1 year ago

@brada4 where do you see 0.2, or what are you on about ?

brada4 commented 1 year ago

It talks version around other bug, but no mention hjen sve support appeared akin gcc

cparrott73 commented 1 year ago

I'm not quite sure what @brada4 is asking for here, but if the question is regarding which release SVE support was added to the NVIDIA HPC compilers:

According to my research, it appears to have been added as of the 22.5 release:

cparrott@thunder3 ~ $ module load nvcompilers/22.5
cparrott@thunder3 ~ $ nvc -Msve_intrinsics -c svetest.c
cparrott@thunder3 ~ $ module purge
cparrott@thunder3 ~ $ module load nvcompilers/22.3
cparrott@thunder3 ~ $ nvc -Msve_intrinsics -c svetest.c
nvc-Error-Unknown switch: -Msve_intrinsics
cparrott@thunder3 ~ $ module purge

Note that the -Msve_intrinsics flag throws an error with 22.3, but is accepted with 22.5.

Hope this helps.

cparrott73 commented 1 year ago

@martin-frbg - FYI, I will be sending another arm64 patch your way in about a month or so.

Our developers have made the decision to switch to a gfortran-compatible complex ABI on arm64 only, starting with the forthcoming 23.9 release. This change uses the s0/s1 registers to return complex return values as gfortran does, rather than on the stack as we currently do. (Side note: I wish we would do this on x86_64 as well, but unfortunately, the decision is not mine to make.)

I have a patch for our current 23.9 development builds that causes OpenBLAS to pass -DF_INTERFACE_GFORT instead of -DF_INTERFACE_PGI, and it appears to be working fine here, with all the BLAS tests passing. We will probably need to refine this patch to test on the version of the HPC compilers and pass the correct flag here, accordingly.

Just wanted to give you a heads-up on this change.

brada4 commented 1 year ago

Here is the SVE function check? https://github.com/xianyi/OpenBLAS/blob/d69f57c8c22a921b7199410cff1f1db095887612/c_check#L277

cparrott73 commented 1 year ago

@brada4 - yes, my patch above modifies this check to also check to see if the compiler supports -Msve_intrinsics:

--- a/c_check   2023-04-01 13:18:01.000000000 -0700
+++ b/c_check   2023-08-23 10:00:09.239090999 -0700
@@ -249,6 +249,9 @@
     {
         $compiler_name $flags $args >/dev/null 2>&1
     } || {
+        args=" -Msve_intrinsics -c -o $tmpf.o $tmpf"
+        $compiler_name $flags $args >/dev/null 2>&1
+    } || {
         no_sve=1
     }
     rm -rf "$tmpd"
brada4 commented 1 year ago

The gcc/llvm SVE check is in section below l277, but i think you gave sufficient idea.

martin-frbg commented 1 year ago

@brada4 GNU patch is perfectly able to recognize such offsets...