archspec 0.1.3 identies AMD Milan in Azure as zen2 due to pku feature that is only listed for zen3

boegel commented 2 years ago

When running archspec cpu on an Azure HB120rs v3 instance (AMD Milan), it answers with zen2, while it should be zen3.

The problem is that pku is not listed as a CPU feature:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 25
model       : 1
model name  : AMD EPYC 7V13 64-Core Processor
stepping    : 1
microcode   : 0xffffffff
cpu MHz     : 2445.400
cache size  : 512 KB
physical id : 0
siblings    : 60
core id     : 0
cpu cores   : 60
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm art rep_good nopl extd_apicid aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext invpcid_single retpoline_amd vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat umip vaes vpclmulqdq
bogomips    : 4890.80
TLB size    : 2560 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

Should we drop pku from the zen3 entry in cpu/microarchitectures.json. Note that pku was dropped from znver3 in #28 by @amd-toolchain-support

@hmeiland Do you happen to know why pku is not supported as a CPU feature in Azure instances with AMD Milan CPUs?

alalazo commented 2 years ago

@boegel See https://github.com/archspec/archspec-json/pull/29 for why PKU cannot be removed. When we pass -march=znver3 to GCC the compiler can emit instructions from that set, so we might generate binaries that are incompatible with the platform being detected.

tgamblin commented 2 years ago

I’d be curious to see if there’s much of a performance difference between zen2, zen3, and x86_64_v3 targets on Milan.

hmeiland commented 2 years ago

@boegel See #29 for why PKU cannot be removed. When we pass -march=znver3 to GCC the compiler can emit instructions from that set, so we might generate binaries that are incompatible with the platform being detected.

@alalazo: I see the support in GCC and also some kernel support for PKU/PKEYS as of 4.9; however this appears already from Intel Skylake; Why now add this as a requirement for recognizing Zen3? In practice, this feature is not supported on the Azure Hypervisor and we have not any customers asking for it either. We're happy to look into it further based on actual application requirements, but I would like to ask to drop this requirement for detection only....

boegel commented 2 years ago

I can confirm that on an Intel Skylake system (on prem at HPC-UGent), the pku feature is indeed listed in /proc/cpuinfo:

model name      : Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx
fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg
fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx
rdt_a avx512f avx512dq rdseed adx clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear spec_ctrl
intel_stibp flush_l1d

So if we're not including pku in the list of features for skylake_avx512, we shouldn't for zen3 either?

We're actually only doing that for zen3 currently...

There's still both vpclmulqdq and vaes to discriminate between zen2 and zen3 (which are both there in the VM on Azure).

alalazo commented 2 years ago

@hmeiland As replied to @boegel in #40 I need to check if the lack of PKU on Intel architectures is due to an error on our side or to GCC starting to support that from some version onward. If a compiler says that an instruction set CAN be emitted we can't ensure the binary compatibility of an artifact moved from one zen3 uarch to another if we just remove the feature.

I think the correct solution is to check why PKU is not in the Intel uarchs and eventually start to think of modelling "optional cpu features". As a stop-gap, if we realize this might take more time than we want, we could remove PKU - but we also need to be sure to add flags to each compiler entry that disable the emission of those kind of instructions.

hmeiland commented 2 years ago

@alalazo : thank you for looking further into this; as an alternative I'm happy to help compose arch definitions for "Hypervisor vendor: Microsoft", where I do not see a full 1:1 mapping of bare-metal features in the short run....

alalazo commented 2 years ago

Using gcc@7.5.0 from the ubuntu:18.04 Docker image, it seems PKU is already in there:

skylake-avx512
               Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL,
               FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and
               AVX512CD instruction set support.

This news seems to point to the fact that PKU instructions have been added to GCC around 2015. Looking at GCC man pages it seems this was introduced in the 6.X series.

alalazo commented 2 years ago

I think the error here is not having PKU in the Intel microarchitectures, rather than the opposite. I still have to track when support for that has been added to the kernel, but I see for instance that the linux-rhel7-skylake_avx512 sample of /proc/cpuinfo we have doesn't expose that instruction set.

Designing support for optional features is possibly good also for AARCH64. Off the top of my head I think we need to:

Enumerate the optional features for each microarchitecture
Avoid using optional features to perform detection
Extend the compatibility semantics for optional features (e.g. skylake_avx512:pku is not compatible with cascadelake:no_pku)
Encode somewhere in the JSON file how to turn these features on/off (-mpku vs. -mno-pku) and use that info to return the correct set of compiler flags

tgamblin commented 2 years ago

@hmeiland is the goal here to accurately identify the processor or to accurately identify the best instruction set for things compiled on it? I think if it's the latter, using a generic target like x86_64_v3, which includes the vector instructions sans avx-512, might be a better option than having multiple zen3 definitions. Are the other CPU features important to recognize on Azure nodes?

boegel commented 2 years ago

The goal is to distinguish between zen2 and zen3 (this is in the scope of EESSI).

We could also consider using the generic targets like x86_64_v3, but we're not doing that currently. With archspec 0.1.3, AMD Milan instances on Azure are essentially misidentified as AMD Rome, and so the software installations in EESSI optimized for zen3 will not be used on AMD Milan instances.

We haven't assessed the performance impact of this, it could be small, but it's likely very dependent on the workload too, so difficult to assess.

In fact, we probably also want to discriminate between Milan and Milan X, but that may be out of scope for archspec (and that's a separate discussion).

tgamblin commented 2 years ago

I agree with @alalazo — we probably should list pku for skylake, as it’s listed by gcc as supported on skylake machines. It’s also listed in the phoronix releases about zen3 support for gcc and clang:

I don’t consider them “misidentified” at all if it means that binaries labeled zen3 will not work on the system. If that’s violated, we lose the whole point of archspec: understanding compatibility, and your zen3 installations may not even work on Azure Milan instances anyway.

We could try to confirm, I suppose. Any chance you could run objdump -d on the binaries you’ve built for zen3 and see if there are rdpkru/wrpkru instructions in there, or if pkey_set is called? Also are those binaries built with new enough versions of gcc and clang? (see above).

See here for a more detailed description of what the feature is:

https://www.kernel.org/doc/html/latest/core-api/protection-keys.html

What I can’t tell from this is where it would really be used and whether binaries would use instructions directly or go through pkey_set, and whether they would have some sort of mechanism for making that portable if pku isn’t actually available. If the latter is typically the case or if we can narrow down that this just doesn’t appear in any zen3 binaries I think it’s fine to remove.

If not we either need fine grained feature selection, or we need to start differentiating between identifying the chip for tuning and identifying the chip for compatibiity. For example, I could see the benefit of having a zen3-tuned x86_64_v3 binary — it might perform way better than a genetically tuned one. That is something we haven’t modeled in archspec much.

We could also decide that we don’t care about all the instructions and just start disabling pku by default for zen3 in archspec. That would be sort of like removing it, but we should also add compiler flags to ensure that pku instructions are never emitted. We do finer-grained flags like this for the x86_64_vX descriptions on older compilers — usually by adding instructions to an older target (see, e.g., here for gcc: https://github.com/archspec/archspec-json/blob/master/cpu/microarchitectures.json#L159:L169). This would be simpler than trying to actually model pku.

I’d be curious to see performance results, though, to know whether we really need to tune for the arch here, or whether a zen2 or x86_64_v3 binary is just as good. IMO showing that the generic target works just as well is a bigger win — and if it doesn’t we do probably need to think hard about tuning vs. compatibility.

archspec / archspec-json

archspec 0.1.3 identies AMD Milan in Azure as zen2 due to pku feature that is only listed for zen3 #38