Open lexming opened 2 years ago
I'm not in favor of going with more generic names, since that's going to cause more confusion ("v3" is essentially meaningless, and it that name is all you have, how the hell do you figure out what it corresponds to)...
Perhaps different yet descriptive names like avx2_fma
are worth considering, but v3
is bad imho. Especially because x86+64/v2
means SSE4 and x86_64/v3
means AVX+AVX2; there's nothing in between v2 and v3, and that doesn't make sense from a performance perspective imho.
Maybe this boils down to misinterpreting things though: I like the haswell
name because most people in the HPC community likely know what this stands for, and it basically means "software installations in /haswell
are compatible with CPUs that support an instruction set like Intel Haswell CPUs. It does not mean "only works on Haswell CPUs".
It's just the commonly accepted name for the Intel Haswell microarchitecture, which is known to imply "supports AVX2 instructions".
If we're careful enough about picking the CPU microarchitecture of the build hosts we use, then we shouldn't run into trouble. Do we have examples where problems like this actually popped up with the current EESSI pilot repositories? Or, can we come up with a system where the CPu detection currently results in picking modules that fail to run on that system?
That said, this is probably going to turn into a bikeshedding discussion. There will always be pros and contras of whatever approach we take, I think; there's no clear technically best solution probably...
Note that the levels aren't made up -- they're supposed to be a compatibility standard. They come from the glibc team:
and they are supported by clang and gcc:
We have backported them in archspec
to other compilers using flag combinations. Given that most industry compilers are downstream clang, I expect to see them more widely supported in the future (if not already). So I wouldn't deviate from the standard names.
These levels being standard doesn't mean they're fine-grained enough... You know as well as I do that the jump from only using SSE4 instructions to AVX+AVX2, without a step in the middle for AVX-only, is a pretty significant jump with performance in mind. That said, these levels are also no fine-grained enough in various ways in the context of EESSI.
It's perfectly fine that archspec supports them, since they do indeed seem to be some form of standard, but to me it seems like they're not what we need.
Also, these are only for x86_64
. Are there equivalents for other CPU architectures, in particular Arm and RISC-V?
Ideally, we have a mechanism that works across different CPU architectures, like using code names of microarchitectures (haswell
, zen2
, neoverse-n1
, rv64gc
, etc.).
For ARM the generics I would use would be the armv8.1
, etc. names, along with the uarch names as we do in archspec
. I know the generics are not fine-grained. That's why we also have the uarch
names. The relationships (currently) are shown in arch-all.pdf.
You can see that the ARM side needs fleshing out with way less specific names, while x86_64
has both.
One of my ideas behind archdetect was to simplify from ideal/perfect back to good-enough for the job required for EESSI; this implementation will most likely out-live us in the next decade (that is only ~2 generations of clusters, or 2-3 generations of micro-architectures). Detecting the x86 architectures is "simple": all the required info is mature and clearly defined in the cpu flags and readable through e.g. /proc. Using the 'common' microarchitecture names allows us to clearly talk about them. Adding new features such as gpu detection will most likely add more value than trying to optimize here...
Currently, the software layer is built on a host machine for that host machine (i.e.
--march=native
). However, on client side, we are potentially using those same binaries in a variety of architectures. For instance, we have an installation forhaswell
, but inarchdetect
(#187) any AVX2-only system will be identified ashaswell
and use those binaries. This can work sometimes (e.g. broadwell and haswell have 99% the same instruction set), but for others it will be problematic.Since we will (probably) never have the resources to build binaries for all x86_64 CPU micro-architectures, I propose that we move to more generic builds and label those installations accordingly to avoid any confusion (i.e. not with an existing arch name, unless it will only be used in that CPU arch).
As discussed in Slack, a good starting point could be the standard CPU feature levels supported in gcc and clang:
We could replace the existing installations with the following:
-march=x86-64-v3
and distribute those binaries inx86_64/v3
-march=x86-64-v4
and distribute those binaries inx86_64/v4
Technically, we could even use
x86_64/v3
for AMDzen
,zen2
andzen3
. But if these generic definitions do not work well for certain CPU archs, we can always add new ones custom-made or have installations that specifically target a single CPU arch. For instance, based on the current situation, let's say that we wanted to keepzen3
separate to be able to further optimize that systemWhat are your thoughts?