Open boegel opened 2 years ago
@boegel See https://github.com/archspec/archspec-json/pull/29 for why PKU cannot be removed. When we pass -march=znver3
to GCC the compiler can emit instructions from that set, so we might generate binaries that are incompatible with the platform being detected.
I’d be curious to see if there’s much of a performance difference between zen2, zen3, and x86_64_v3 targets on Milan.
@boegel See #29 for why PKU cannot be removed. When we pass
-march=znver3
to GCC the compiler can emit instructions from that set, so we might generate binaries that are incompatible with the platform being detected.
@alalazo: I see the support in GCC and also some kernel support for PKU/PKEYS as of 4.9; however this appears already from Intel Skylake; Why now add this as a requirement for recognizing Zen3? In practice, this feature is not supported on the Azure Hypervisor and we have not any customers asking for it either. We're happy to look into it further based on actual application requirements, but I would like to ask to drop this requirement for detection only....
I can confirm that on an Intel Skylake system (on prem at HPC-UGent), the pku
feature is indeed listed in /proc/cpuinfo
:
model name : Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx
fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg
fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx
rdt_a avx512f avx512dq rdseed adx clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear spec_ctrl
intel_stibp flush_l1d
So if we're not including pku
in the list of features
for skylake_avx512
, we shouldn't for zen3
either?
We're actually only doing that for zen3
currently...
There's still both vpclmulqdq
and vaes
to discriminate between zen2
and zen3
(which are both there in the VM on Azure).
@hmeiland As replied to @boegel in #40 I need to check if the lack of PKU
on Intel architectures is due to an error on our side or to GCC starting to support that from some version onward. If a compiler says that an instruction set CAN be emitted we can't ensure the binary compatibility of an artifact moved from one zen3
uarch to another if we just remove the feature.
I think the correct solution is to check why PKU is not in the Intel uarchs and eventually start to think of modelling "optional cpu features". As a stop-gap, if we realize this might take more time than we want, we could remove PKU - but we also need to be sure to add flags to each compiler entry that disable the emission of those kind of instructions.
@alalazo : thank you for looking further into this; as an alternative I'm happy to help compose arch definitions for "Hypervisor vendor: Microsoft", where I do not see a full 1:1 mapping of bare-metal features in the short run....
Using gcc@7.5.0
from the ubuntu:18.04
Docker image, it seems PKU is already in there:
skylake-avx512
Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL,
FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and
AVX512CD instruction set support.
This news seems to point to the fact that PKU instructions have been added to GCC around 2015. Looking at GCC man pages it seems this was introduced in the 6.X series.
I think the error here is not having PKU
in the Intel microarchitectures, rather than the opposite. I still have to track when support for that has been added to the kernel, but I see for instance that the linux-rhel7-skylake_avx512
sample of /proc/cpuinfo
we have doesn't expose that instruction set.
Designing support for optional features is possibly good also for AARCH64. Off the top of my head I think we need to:
skylake_avx512:pku
is not compatible with cascadelake:no_pku
)-mpku
vs. -mno-pku
) and use that info to return the correct set of compiler flags@hmeiland is the goal here to accurately identify the processor or to accurately identify the best instruction set for things compiled on it? I think if it's the latter, using a generic target like x86_64_v3
, which includes the vector instructions sans avx-512, might be a better option than having multiple zen3
definitions. Are the other CPU features important to recognize on Azure nodes?
The goal is to distinguish between zen2
and zen3
(this is in the scope of EESSI).
We could also consider using the generic targets like x86_64_v3
, but we're not doing that currently.
With archspec 0.1.3, AMD Milan instances on Azure are essentially misidentified as AMD Rome, and so the software installations in EESSI optimized for zen3
will not be used on AMD Milan instances.
We haven't assessed the performance impact of this, it could be small, but it's likely very dependent on the workload too, so difficult to assess.
In fact, we probably also want to discriminate between Milan and Milan X, but that may be out of scope for archspec
(and that's a separate discussion).
I agree with @alalazo — we probably should list pku for skylake, as it’s listed by gcc as supported on skylake machines. It’s also listed in the phoronix releases about zen3 support for gcc and clang:
I don’t consider them “misidentified” at all if it means that binaries labeled zen3
will not work on the system. If that’s violated, we lose the whole point of archspec: understanding compatibility, and your zen3
installations may not even work on Azure Milan instances anyway.
We could try to confirm, I suppose. Any chance you could run objdump -d
on the binaries you’ve built for zen3
and see if there are rdpkru
/wrpkru
instructions in there, or if pkey_set
is called? Also are those binaries built with new enough versions of gcc and clang? (see above).
See here for a more detailed description of what the feature is:
What I can’t tell from this is where it would really be used and whether binaries would use instructions directly or go through pkey_set, and whether they would have some sort of mechanism for making that portable if pku isn’t actually available. If the latter is typically the case or if we can narrow down that this just doesn’t appear in any zen3 binaries I think it’s fine to remove.
If not we either need fine grained feature selection, or we need to start differentiating between identifying the chip for tuning and identifying the chip for compatibiity. For example, I could see the benefit of having a zen3-tuned x86_64_v3 binary — it might perform way better than a genetically tuned one. That is something we haven’t modeled in archspec much.
We could also decide that we don’t care about all the instructions and just start disabling pku by default for zen3 in archspec. That would be sort of like removing it, but we should also add compiler flags to ensure that pku instructions are never emitted. We do finer-grained flags like this for the x86_64_vX descriptions on older compilers — usually by adding instructions to an older target (see, e.g., here for gcc: https://github.com/archspec/archspec-json/blob/master/cpu/microarchitectures.json#L159:L169). This would be simpler than trying to actually model pku.
I’d be curious to see performance results, though, to know whether we really need to tune for the arch here, or whether a zen2 or x86_64_v3 binary is just as good. IMO showing that the generic target works just as well is a bigger win — and if it doesn’t we do probably need to think hard about tuning vs. compatibility.
When running
archspec cpu
on an Azure HB120rs v3 instance (AMD Milan), it answers withzen2
, while it should bezen3
.The problem is that
pku
is not listed as a CPU feature:Should we drop
pku
from thezen3
entry incpu/microarchitectures.json
. Note thatpku
was dropped fromznver3
in #28 by @amd-toolchain-support@hmeiland Do you happen to know why
pku
is not supported as a CPU feature in Azure instances with AMD Milan CPUs?