cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

Align the vectorisation targets used by CMSSW to those used by the x86-64 psABI #43652

Closed fwyzard closed 1 month ago

fwyzard commented 8 months ago

The default target when compiling code in CMSSW is -march=x86-64 -msse3. This corresponds roughly to the Intel Xeon from 2004 and AMD Opteron from 2005 - close to 20 years ago.

SCRAM and CMSSW have experimental support for extra vectorisation targets, that are enabled e.g. in the CMSSW_14_0_SKYLAKEAVX512_X builds:

-march=haswell enables additional instruction, notably AVX, AVX2 and FMA. It corresponds to the Intel Xeon from 2014 (Haswell and later) and AMD Opteron from 2017 (Excavator).

-march=skylake-avx512 enables additional instruction, notably a large set of AVX-512 extensions. It corresponds to the Intel Xeon from 2017 (Skylake and later) and AMD EPYC from 2022 (Genoa and later).

This approach, while viable, has a few disadvantages:

In the last few years (here is the original llvm-dev discussion), glibc, GCC and clang have introduced support for a standardised set of "hardware capabilities" (hwcap) or "micro-architecture levels" in the x86-64 psABI (see https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/low-level-sys-info.tex):

This approach is similar to what SCRAM and CMSSW are experimenting with: in fact, the x86-64-v3 and x86-64-v4 psABI levels are very close to haswell and skylake-avx512 SCRAM targets, respectively.

Switching to these levels also for SCRAM and CMSSW would have a few advantages:

To conclude, my proposal is to:

At a later time we can decide whether to raise the default level to x86-64-v2 (SSE4.x, circa 2009) or even x86-64-v3 (FMA and AVX2, circa 2014).

For reference, I've tried to summarise in the table below the relevant extensions to the x86-64 instructions sets supported by the various processors families. Those were extracted from GCC 13 via gcc -march=... -Q --help=target, and double checked on Wikipedia, Intel Ark, etc. I've added the extensions corresponding to the SCRAM targets and psABI levels, as well as the target and level that can be used by each processor.

-march= -mtune=   CMSSW level x86-64 psABI level   approximate year   -mmmx -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2 -mavx -mfma -mavx2 -mavx512bw -mavx512cd -mavx512dq -mavx512f -mavx512vl -mavx512vnni -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx512bitalg -mavx512vpopcntdq -mavx512bf16 -mavx512vp2intersect -mavx512fp16 -mavxvnni   -mmwait -mcx16 -msahf -mpopcnt -mcrc32 -mpclmul -mxsave -mxsaveopt -mlzcnt -mf16c -mbmi -mfsgsbase -mmovbe -mprfchw -mrdrnd -mbmi2 -maes -mrdseed -madx -mclflushopt -mhle -mxsavec -mxsaves -mpku -msha -msgx -mabm -mclwb -mrdpid -mvaes -mvpclmulqdq -mgfni -mmwaitx -mwbnoinvd -mclzero -mmovdir64b -mmovdiri -mpconfig -mptwrite -mcldemote -mwaitpkg -mkl -mserialize -mwidekl -mamx-bf16 -mamx-int8 -mamx-tile -menqcmd -mtsxldtrk -muintr -mhreset

Intel performance CPUs nocona | nocona |   | sse3 | x86-64 |   | 2004 |   | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ core2 | core2 |   | sse3 | x86-64 |   | 2006 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ nehalem | nehalem |   | sse3 | x86-64-v2 |   | 2009 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ westmere | westmere |   | sse3 | x86-64-v2 |   | 2010 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ sandybridge | sandybridge |   | sse3 | x86-64-v2 |   | 2012 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ ivybridge | ivybridge |   | sse3 | x86-64-v2 |   | 2013 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ haswell | haswell |   | haswell | x86-64-v3 |   | 2014 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ broadwell | broadwell |   | haswell | x86-64-v3 |   | 2015 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ skylake | skylake |   | haswell | x86-64-v3 |   | 2015 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ skylake-avx512 | skylake-avx512 |   | skylake-avx512 | x86-64-v4 |   | 2017 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ cannonlake | cannonlake |   | skylake-avx512 | x86-64-v4 |   | 2018 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ icelake-client | icelake-client |   | skylake-avx512 | x86-64-v4 |   | 2019 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ rocketlake | rocketlake |   | skylake-avx512 | x86-64-v4 |   | 2021 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ icelake-server | icelake-server |   | skylake-avx512 | x86-64-v4 |   | 2019 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ cascadelake | cascadelake |   | skylake-avx512 | x86-64-v4 |   | 2019 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ tigerlake | tigerlake |   | skylake-avx512 | x86-64-v4 |   | 2020 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ cooperlake | cooperlake |   | skylake-avx512 | x86-64-v4 |   | 2020 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ sapphirerapids | sapphirerapids |   | skylake-avx512 | x86-64-v4 |   | 2023 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   Intel hybrid CPUs alderlake | alderlake |   | haswell | x86-64-v3 |   | 2021 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   Intel efficient CPUs bonnell | bonnell |   | sse3 | x86-64 |   |   |   | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ atom | atom |   | sse3 | x86-64 |   |   |   | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ silvermont | silvermont |   | sse3 | x86-64-v2 |   |   |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ slm | slm |   | sse3 | x86-64-v2 |   |   |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ goldmont | goldmont |   | sse3 | x86-64-v2 |   |   |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ goldmont-plus | goldmont-plus |   | sse3 | x86-64-v2 |   |   |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ tremont | tremont |   | sse3 | x86-64-v2 |   |   |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   AMD performance CPUs k8 | k8 |   |   | x86-64 |   | 2003 |   | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ k8-sse3 | k8-sse3 |   | sse3 | x86-64 |   | 2005 |   | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ barcelona | barcelona |   | sse3 | x86-64 |   | 2007 |   | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ bdver1 | bdver1 |   | sse3 | x86-64-v2 |   | 2011 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ bdver2 | bdver2 |   | sse3 | x86-64-v2 |   | 2012 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ bdver3 | bdver3 |   | sse3 | x86-64-v2 |   | 2014 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ bdver4 | bdver4 |   | haswell | x86-64-v3 |   | 2017 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ znver1 | znver1 |   | haswell | x86-64-v3 |   | 2017 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ znver2 | znver2 |   | haswell | x86-64-v3 |   | 2019 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ znver3 | znver3 |   | haswell | x86-64-v3 |   | 2020 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ znver4 | znver4 |   | skylake-avx512 | x86-64-v4 |   | 2022 |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   AMD efficient CPUs btver1 | btver1 |   | sse3 | x86-64 |   |   |   | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ btver2 | btver2 |   | sse3 | x86-64-v2 |   |   |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   CMSSW levels |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   sse3 | generic |   | sse3 | x86-64 |   | 2004 (Intel), 2005 (AMD) |   | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ haswell | haswell |   | haswell | x86-64-v3 |   | 2014 (Intel), 2015 (AMD) |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ skylake-avx512 | skylake-avx512 |   | skylake-avx512 | x86-64-v4 |   | 2017 (Intel), 2022 (AMD) |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   x86-64 psABI levels |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   x86-64 | generic |   | sse3 | x86-64 |   | 2004 (Intel), 2003 (AMD) |   | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ x86-64-v2 | generic |   | sse3 | x86-64-v2 |   | 2009 (Intel), 2011 (AMD) |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ x86-64-v3 | generic |   | haswell | x86-64-v3 |   | 2014 (Intel), 2015 (AMD) |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ x86-64-v4 | generic |   | skylake-avx512 | x86-64-v4 |   | 2017 (Intel), 2022 (AMD) |   | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |   | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌

fwyzard commented 8 months ago

assign core

cmsbuild commented 8 months ago

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 8 months ago

cms-bot internal usage

cmsbuild commented 8 months ago

A new Issue was created by @fwyzard Andrea Bocci.

@antoniovilela, @sextonkennedy, @smuzaffar, @makortel, @rappoccio, @Dr15Jones can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 8 months ago

Thank you @fwyzard for the very comprehensive issue (pity GitHub renders the table in a way that is somewhat difficult to use).

To conclude, my proposal is to:

  • keep the default level at -msse3;
  • replace the haswell and skylake-avx512 SCRAM targets with the x86-64-v3 and x86-64-v4 psABI levels;
  • replace the logic used to check the supported instruction set with querying the OS;
  • enable the psABI levels in the production builds.

I generally agree with this plan.

On the last point, i.e. deployment in production, I'm wondering if this change would need a specific validation (@cms-sw/pdmv-l2 @cms-sw/ppd-l2). In a way this change would have similarities with major compiler version upgrade, or LTO, that we have validated separately.

On the other hand, (omitting vectorization bugs) the main (only?) expected cause for result differences comes from the FMA in x86-64-v3, and thus the impact should be much more limited than with compiler version upgrade or LTO (also the JITting version of Tensorflow, that made use of the newer-than-sse3 microarchitectures on the fly, was deployed without a specific validation, but that impacted only Tensorflow inference).

I don't know if we can easily perform the validation better than just "running in the wild" (my impression is that our computing system can't schedule jobs based on the CPU microarchitecture). At one moment in September of the global pool CPU cores

although more relevant would be to limit to the sites that are used to produce the RelVal samples.


Another thought on deployment I have is the impact on compilation on user developer areas. I would expect scram b to build the code 3 times out of the box. Is this the case? On one hand, making local build times 3x longer sounds bad, but on the other hand, for e.g. CRAB job submissions (maybe even local batch systems?) the code has to be built for all targets. Would it be feasible to have an option to build only for the microarchitecture of the node being used (at the user's own risk)? Or is there already such option?

fwyzard commented 8 months ago

Here's a different rendering of the table:

image

fwyzard commented 8 months ago

And here a more usable version: x86 instructions table (abridged).ods

fwyzard commented 8 months ago

I would expect scram b to build the code 3 times out of the box. Is this the case?

Yes, it is.

Here is the result of doing

git cms-addpkg HeterogeneousCore/AlpakaTest
scram b

in a CMSSW_14_0_SKYLAKEAVX512_X_2023-12-31-2300 development area:

$ tree lib 
lib
`-- el8_amd64_gcc12
    |-- HeterogeneousCoreAlpakaTestPlugins.edmplugin
    |-- HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync.edmplugin
    |-- HeterogeneousCoreAlpakaTestPluginsPortableROCmAsync.edmplugin
    |-- HeterogeneousCoreAlpakaTestPluginsPortableSerialSync.edmplugin
    |-- libHeterogeneousCoreAlpakaTest.so
    |-- libHeterogeneousCoreAlpakaTestCudaAsync.so
    |-- libHeterogeneousCoreAlpakaTestROCmAsync.so
    |-- libHeterogeneousCoreAlpakaTestSerialSync.so
    |-- pluginHeterogeneousCoreAlpakaTestPlugins.so
    |-- pluginHeterogeneousCoreAlpakaTestPluginsPortableCudaAsync.so
    |-- pluginHeterogeneousCoreAlpakaTestPluginsPortableROCmAsync.so
    |-- pluginHeterogeneousCoreAlpakaTestPluginsPortableSerialSync.so
    |-- scram_haswell
    |   |-- HeterogeneousCoreAlpakaTestPlugins.edmplugin
    |   |-- HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync.edmplugin
    |   |-- HeterogeneousCoreAlpakaTestPluginsPortableROCmAsync.edmplugin
    |   |-- HeterogeneousCoreAlpakaTestPluginsPortableSerialSync.edmplugin
    |   |-- libHeterogeneousCoreAlpakaTest.so
    |   |-- libHeterogeneousCoreAlpakaTestCudaAsync.so
    |   |-- libHeterogeneousCoreAlpakaTestROCmAsync.so
    |   |-- libHeterogeneousCoreAlpakaTestSerialSync.so
    |   |-- pluginHeterogeneousCoreAlpakaTestPlugins.so
    |   |-- pluginHeterogeneousCoreAlpakaTestPluginsPortableCudaAsync.so
    |   |-- pluginHeterogeneousCoreAlpakaTestPluginsPortableROCmAsync.so
    |   `-- pluginHeterogeneousCoreAlpakaTestPluginsPortableSerialSync.so
    `-- scram_skylake-avx512
        |-- HeterogeneousCoreAlpakaTestPlugins.edmplugin
        |-- HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync.edmplugin
        |-- HeterogeneousCoreAlpakaTestPluginsPortableROCmAsync.edmplugin
        |-- HeterogeneousCoreAlpakaTestPluginsPortableSerialSync.edmplugin
        |-- libHeterogeneousCoreAlpakaTest.so
        |-- libHeterogeneousCoreAlpakaTestCudaAsync.so
        |-- libHeterogeneousCoreAlpakaTestROCmAsync.so
        |-- libHeterogeneousCoreAlpakaTestSerialSync.so
        |-- pluginHeterogeneousCoreAlpakaTestPlugins.so
        |-- pluginHeterogeneousCoreAlpakaTestPluginsPortableCudaAsync.so
        |-- pluginHeterogeneousCoreAlpakaTestPluginsPortableROCmAsync.so
        `-- pluginHeterogeneousCoreAlpakaTestPluginsPortableSerialSync.so

3 directories, 36 files

... which makes me wonder: maybe for the CUDA and ROCm backends, we could avoid building all three architectures by default ? I would expect to be very little compute-intensive code there.

makortel commented 8 months ago

maybe for the CUDA and ROCm backends, we could avoid building all three architectures by default ?

I have a vague recollection @smuzaffar tried to build only some of CMSSW packages for multiple microarchitectures (e.g. FWCore is unlikely to benefit from vectorization), but there was some trouble with it.

smuzaffar commented 8 months ago

Thanks a lot @fwyzard , I fully support scram multi-arch levels with psABI levels. That will really simplify the arch checking logic and probably use batter psABI level for a given host.

@fwyzard , we can also update build rules to disable multi-vec compilation for ALPKA CUDA/ROCm backends.

@makortel , one can also selectively enable multi-vec compilation for selected packages by explicitly adding (in the Package/BuildFile.xml) <flags TARGETS="1"/> to build for all multi-arch supported by project level configurations. The only issue is that we need to go through all packages to add this flag for those packages where it has any effects. If there is any automated way to find if a package can benefits from multi-arch builds then we can add a test to check for such packages

makortel commented 8 months ago

one can also selectively enable multi-vec compilation for selected packages by explicitly adding (in the Package/BuildFile.xml) <flags TARGETS="1"/> to build for all multi-arch supported by project level configurations. The only issue is that we need to go through all packages to add this flag for those packages where it has any effects. If there is any automated way to find if a package can benefits from multi-arch builds then we can add a test to check for such packages

Just thinking out loud, I suppose we could take the opposite way, and selectively disable the multi-arch compilation on selected packages. Maybe that would be easier to maintain? (assuming most of the algorithm packages would benefit from newer instructions, which is not necessarily the case)

makortel commented 7 months ago

This topic was discussed in the Core Software meeting today, summary:

@smuzaffar @fwyzard Do you have any corrections?

fwyzard commented 7 months ago

No corrections, just some minor additions:

smuzaffar commented 7 months ago

@fwyzard @makortel , https://github.com/cms-sw/cmsdist/pull/8951 adds the support for psABI micro archs. Once passed then I will setup CMSSW_14_0_MULTIARCH_X build with sse3 (as default micro-arch) and x86-64-v3 as the exta micro-arch.

fwyzard commented 7 months ago

Is there a way to tell scram to build specific packages for x86-64-v4 ?

smuzaffar commented 7 months ago

@fwyzard , here is the logic for building multi-archs for cmssw packages

One issue with building some packages with higher psABI (e.g. x86-64-v4) and then setting CMSSW env for x86-64-v4 means that libs/plugins built without x86-64-v4 will be loaded from the default micro arch directory (sse3). Note that scram will set LD_LIBRARY_PATH to something like /path/cmssw/lib/scram-x86-64-v4:/path/cmssw/lib)

fwyzard commented 7 months ago

OK, thanks for the details.

Let's go ahead like this, and revisit later on whether we care about v4.

smuzaffar commented 7 months ago

@fwyzard , first IB CMSSW_14_0_MULTIARCHS_X_2024-01-18-2300 with psABI x86-64-v3 is now available for el8_amd64_gcc12 . Can you please test it and see if it works as expected.

fwyzard commented 7 months ago

thanks @smuzaffar , I can confirm that using CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300 on an HLT node (with a Milan CPU) scram picks the x86-64-v3 microarchitecture:

$ cmsenv
$ echo $LD_LIBRARY_PATH | tr : \\n
/data/user/fwyzard/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/biglib/el8_amd64_gcc12/scram_x86-64-v3
/data/user/fwyzard/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/biglib/el8_amd64_gcc12
/data/user/fwyzard/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/lib/el8_amd64_gcc12/scram_x86-64-v3
/data/user/fwyzard/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/lib/el8_amd64_gcc12
/data/user/fwyzard/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/external/el8_amd64_gcc12/lib/scram_x86-64-v3
/data/user/fwyzard/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/external/el8_amd64_gcc12/lib
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/biglib/el8_amd64_gcc12/scram_x86-64-v3
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/biglib/el8_amd64_gcc12
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/lib/el8_amd64_gcc12/scram_x86-64-v3
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/lib/el8_amd64_gcc12
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/external/el8_amd64_gcc12/lib/scram_x86-64-v3
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_MULTIARCHS_X_2024-01-22-2300/external/el8_amd64_gcc12/lib
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_MULTIARCHS_X_2024-01-21-2300/biglib/el8_amd64_gcc12/scram_x86-64-v3
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_MULTIARCHS_X_2024-01-21-2300/biglib/el8_amd64_gcc12
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_MULTIARCHS_X_2024-01-21-2300/lib/el8_amd64_gcc12/scram_x86-64-v3
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_MULTIARCHS_X_2024-01-21-2300/lib/el8_amd64_gcc12
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/external/llvm/17.0.3-58617194c079c8f35fd2aa0eeb9674ef/lib64
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib64
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02821/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib
fwyzard commented 7 months ago

Also, only the SerialSync version of the alpaka plugins is built for scram_x86-64-v3, while they are all built for the default architecture:

$ find el8_amd64_gcc12/ | sort | grep -i alpaka.*ync.so
el8_amd64_gcc12/libHeterogeneousCoreAlpakaCoreCudaAsync.so
el8_amd64_gcc12/libHeterogeneousCoreAlpakaCoreROCmAsync.so
el8_amd64_gcc12/libHeterogeneousCoreAlpakaCoreSerialSync.so
el8_amd64_gcc12/libHeterogeneousCoreAlpakaServicesCudaAsync.so
el8_amd64_gcc12/libHeterogeneousCoreAlpakaServicesROCmAsync.so
el8_amd64_gcc12/libHeterogeneousCoreAlpakaServicesSerialSync.so
el8_amd64_gcc12/libHeterogeneousCoreAlpakaTestCudaAsync.so
el8_amd64_gcc12/libHeterogeneousCoreAlpakaTestROCmAsync.so
el8_amd64_gcc12/libHeterogeneousCoreAlpakaTestSerialSync.so
el8_amd64_gcc12/pluginHeterogeneousCoreAlpakaServicesPluginsCudaAsync.so
el8_amd64_gcc12/pluginHeterogeneousCoreAlpakaServicesPluginsROCmAsync.so
el8_amd64_gcc12/pluginHeterogeneousCoreAlpakaServicesPluginsSerialSync.so
el8_amd64_gcc12/pluginHeterogeneousCoreAlpakaTestPluginsPortableCudaAsync.so
el8_amd64_gcc12/pluginHeterogeneousCoreAlpakaTestPluginsPortableROCmAsync.so
el8_amd64_gcc12/pluginHeterogeneousCoreAlpakaTestPluginsPortableSerialSync.so
el8_amd64_gcc12/scram_x86-64-v3/libHeterogeneousCoreAlpakaCoreSerialSync.so
el8_amd64_gcc12/scram_x86-64-v3/libHeterogeneousCoreAlpakaServicesSerialSync.so
el8_amd64_gcc12/scram_x86-64-v3/libHeterogeneousCoreAlpakaTestSerialSync.so
el8_amd64_gcc12/scram_x86-64-v3/pluginHeterogeneousCoreAlpakaServicesPluginsSerialSync.so
el8_amd64_gcc12/scram_x86-64-v3/pluginHeterogeneousCoreAlpakaTestPluginsPortableSerialSync.so
makortel commented 1 month ago

+core

This issue has been addressed

makortel commented 1 month ago

@cmsbuild, please close

cmsbuild commented 1 month ago

This issue is fully signed and ready to be closed.