Closed cparrott73 closed 1 year ago
Since version 0.2?
@brada4 where do you see 0.2, or what are you on about ?
It talks version around other bug, but no mention hjen sve support appeared akin gcc
I'm not quite sure what @brada4 is asking for here, but if the question is regarding which release SVE support was added to the NVIDIA HPC compilers:
According to my research, it appears to have been added as of the 22.5 release:
cparrott@thunder3 ~ $ module load nvcompilers/22.5
cparrott@thunder3 ~ $ nvc -Msve_intrinsics -c svetest.c
cparrott@thunder3 ~ $ module purge
cparrott@thunder3 ~ $ module load nvcompilers/22.3
cparrott@thunder3 ~ $ nvc -Msve_intrinsics -c svetest.c
nvc-Error-Unknown switch: -Msve_intrinsics
cparrott@thunder3 ~ $ module purge
Note that the -Msve_intrinsics
flag throws an error with 22.3, but is accepted with 22.5.
Hope this helps.
@martin-frbg - FYI, I will be sending another arm64 patch your way in about a month or so.
Our developers have made the decision to switch to a gfortran-compatible complex ABI on arm64 only, starting with the forthcoming 23.9 release. This change uses the s0/s1 registers to return complex return values as gfortran does, rather than on the stack as we currently do. (Side note: I wish we would do this on x86_64 as well, but unfortunately, the decision is not mine to make.)
I have a patch for our current 23.9 development builds that causes OpenBLAS to pass -DF_INTERFACE_GFORT
instead of -DF_INTERFACE_PGI
, and it appears to be working fine here, with all the BLAS tests passing. We will probably need to refine this patch to test on the version of the HPC compilers and pass the correct flag here, accordingly.
Just wanted to give you a heads-up on this change.
Here is the SVE function check? https://github.com/xianyi/OpenBLAS/blob/d69f57c8c22a921b7199410cff1f1db095887612/c_check#L277
@brada4 - yes, my patch above modifies this check to also check to see if the compiler supports -Msve_intrinsics
:
--- a/c_check 2023-04-01 13:18:01.000000000 -0700
+++ b/c_check 2023-08-23 10:00:09.239090999 -0700
@@ -249,6 +249,9 @@
{
$compiler_name $flags $args >/dev/null 2>&1
} || {
+ args=" -Msve_intrinsics -c -o $tmpf.o $tmpf"
+ $compiler_name $flags $args >/dev/null 2>&1
+ } || {
no_sve=1
}
rm -rf "$tmpd"
The gcc/llvm SVE check is in section below l277, but i think you gave sufficient idea.
@brada4 GNU patch is perfectly able to recognize such offsets...
The NVIDIA HPC compilers are also capable of compiling SVE code on arm64 if the
-Msve_intrinsics
flag is passed to the compiler. I have created a patch to OpenBLAS that modifiesc_check
to check if the compiler supports-Msve_intrinsics
before setting $no_sve = 1. The patch also modifieskernel/Makefile
to check if we are usingnvc
on arm64, and then pass-Msve_instrincs
when building these kernels. This enables theNEOVERSEN2
andNEOVERSEV1
kernels to build withnvc
on arm64, leading to improved performance on these CPUs when building OpenBLAS with the NVIDIA HPC compilers.I tested this change on an AWS Graviton3 instance, and all the internal BLAS tests passed with these kernels enabled.
Note that there is currently a known bug in the NVIDIA HPC compilers involving the CDOT/ZDOT ThunderX2-tuned kernels getting incorrect answers on arm64 when compiled at -O2 or higher, specifically a miscompile of the
zdot_thunderx2t99.c
file. I have triaged this bug and sent a report to our compiler engineers with my findings. Our developer is testing a fix, and my hope is that the fix will be included in the 23.9 release.oblas_nvc_sve.diff.txt