Open Flamefire opened 3 years ago
@Flamefire note that we set
export EASYBUILD_OPTARCH='NVHPC:tp=haswell;Intel:march=core-avx2 -axCore-AVX512;GCC:march=core-avx2'
for the "avx2" arch to be compatible with both Intel and AMD. Do you know if march=core-avx2
is any worse or better than mavx2 -fma
?
No I don't know. These are just the flags we use and one of our long-term admins said they are good. So: Magic! ;)
@bartoldeman https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/compiler-options/compiler-option-details/code-generation-options/m.html#m lists -march=core-avx2
as the suggested replacement for -mfma
So I'd say they are the same.
it's even worse than that, -xHOST
or -march=native
for intel compilers 19.1+ (2020+) produces even slower x87 code, not even sse2:
$ lscpu | grep Model\ name
Model name: AMD EPYC 7532 32-Core Processor
$ icc -v
icc version 19.1.1.217 (gcc version 9.3.0 compatibility)
$ cat test-amd.c
#include <stdio.h>
int main(void)
{
double y;
scanf("%lg\n", &y);
printf("%g\n", y*y);
return 0;
}
$ icc -c -xHost test-amd.c
$ objdump -d test-amd.o
test-amd.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 e4 80 and $0xffffffffffffff80,%rsp
8: 48 81 ec 80 00 00 00 sub $0x80,%rsp
f: bf 03 00 00 00 mov $0x3,%edi
14: 33 f6 xor %esi,%esi
16: e8 00 00 00 00 callq 1b <main+0x1b>
1b: bf 00 00 00 00 mov $0x0,%edi
20: 48 8d 34 24 lea (%rsp),%rsi
24: 33 c0 xor %eax,%eax
26: e8 00 00 00 00 callq 2b <main+0x2b>
2b: dd 04 24 fldl (%rsp)
2e: bf 00 00 00 00 mov $0x0,%edi
33: d8 c8 fmul %st(0),%st
35: b8 01 00 00 00 mov $0x1,%eax
3a: dd 5c 24 08 fstpl 0x8(%rsp)
3e: f2 0f 10 44 24 08 movsd 0x8(%rsp),%xmm0
44: e8 00 00 00 00 callq 49 <main+0x49>
49: 33 c0 xor %eax,%eax
4b: 48 89 ec mov %rbp,%rsp
4e: 5d pop %rbp
4f: c3 retq
see also https://community.intel.com/t5/Intel-Fortran-Compiler/SSE-error-in-compilation-with-xHost-option-on-AMD-Zen-3-CPU/m-p/1287143
(oneapi compilers don't have this issue, and -march=core-avx2
works fine, it's just the cpu detection that's broken)
So https://github.com/easybuilders/easybuild-framework/pull/3797 would really help, I'd say.
We have the following code for Intel toolchains:
However for AMD CPUs this is bad:
xHost
will use SSE only even when AVX2 is available. This then even fully fails installing software or installs it with worse optimizations.As we usually care about CPU vendor and vector instructions supported, I'd optionally add a third argument here: The max supported vector instruction set. In EB, we then need to define a list of supported types, e.g. "avx2, avx, sse2, sse" in that order. If a tuple
(<arch>, <vendor>, <vector>)
is found in the dict that flag is used, otherwise the next lower vec is tried. If all were tried, the last entry is removed and the remaining is tried Extending this to the vendor part would allow this:We should also allow to set this via env vars, similar to
EASYBUILD_OPTARCH
: Instead of conditionally setting it toEASYBUILD_OPTARCH="Intel:mavx2 -fma; GCC:march=native"
as we do on our site while having to check for AMD in the shell script we could do:EASYBUILD_OPTARCH="Intel,x86:xHost; Intel,x86,AMD,AVX2:mavx2 -fma; GCC:march=native"
This would allow some sort of future-proofing this.
For detecting the supported vector extensions we could use archspec or just use the cpu features query we already have and search for avx2 etc.