Closed giordano closed 8 months ago
Can you also post a sample of /proc/cpuinfo
?
processor : 0
BogoMIPS : 2000.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd4f
CPU revision : 0
So lscpu doesn't report paca, pacg, and bti, are they really necessary?
@fspiga Do you have a quick answer to this? Otherwise I can try to have a look at the those features later
Looking at LLVM, they don't seem to expect the paca
, pacg
, and bti
features in Neoverse V2, as far as I understand: https://github.com/llvm/llvm-project/blob/baba0a4cb43181a78881fce683e3a5016daa8ce6/llvm/lib/Target/AArch64/AArch64.td#L1508-L1511
list<SubtargetFeature> NeoverseV2 = [HasV9_0aOps, FeatureBF16, FeatureSPE,
FeaturePerfMon, FeatureETE, FeatureMatMulInt8,
FeatureNEON, FeatureSVE2BitPerm, FeatureFP16FML,
FeatureMTE, FeatureRandGen];
Ok, I tried to do some more investigation, the llvm code snippet I shared above wasn't very useful because the features were hidden behind HasV9_0aOps
, trying to query list of features from the compiler gives a different answer:
% julia --compile=min -e 'using BinaryBuilderBase; BinaryBuilderBase.runshell(Platform("aarch64", "linux"); preferred_gcc_version=v"13", lock_microarchitecture=false)'
sandbox:${WORKSPACE} # gcc --version
aarch64-linux-gnu-gcc (GCC) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
sandbox:${WORKSPACE} # gcc -mcpu=neoverse-v2 -E -dM - < /dev/null | grep -E '__ARM_FEATURE_(BTI|PAUTH)'
#define __ARM_FEATURE_BTI 1
#define __ARM_FEATURE_PAUTH 1
sandbox:${WORKSPACE} # clang --version
clang version 16.0.6 (/home/gbaraldi/.julia/dev/BinaryBuilderBase/deps/downloads/clones/llvm-project.git-1df819a03ecf6890e3787b27bfd4f160aeeeeacd50a98d003be8b0893f11a9be 7cbf1a2591520c2491aa35339f227775f4d3adf6)
Target: arm64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/x86_64-linux-musl/bin
sandbox:${WORKSPACE} # clang -mcpu=neoverse-v2 -E -dM - < /dev/null | grep -E '__ARM_FEATURE_(BTI|PAUTH)'
#define __ARM_FEATURE_BTI 1
#define __ARM_FEATURE_PAUTH 1
This is a bit worrying, I fear either Nvidia implemented a Neoverse V2 chip without these features, or the operating system can't report them correctly. Either way, compilers seem to make use of these extensions when targeting neoverse-v2
:confused: Spack generating suboptimal code for this chip would be not great. For the record, I'm using Red Hat 9
$ uname -a
Linux locust.rc.ucl.ac.uk 5.14.0-362.13.1.el9_3.aarch64+64k #1 SMP PREEMPT_DYNAMIC Fri Nov 24 03:48:25 EST 2023 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/redhat-release
Red Hat Enterprise Linux release 9.3 (Plow)
More diving into this rabbit hole:
lscpu
in section "2.1 Checking the CPUs" at page 17 and it shows paca
, pacq
, and bti
$ grep -E 'CONFIG_ARM64_(BTI|PTR_AUTH)' /boot/config-5.14.0-362.13.1.el9_3.aarch64+64k
# CONFIG_ARM64_PTR_AUTH is not set
# CONFIG_ARM64_BTI is not set
despite these options should be enabled by default in Linux:
furthermore, default gcc compiler on this system is GCC 11 (Red Hat flavour) and it does support the neoverse-v2
target despite having been introduced upstream only in GCC 13 and :drum: :drum:
$ gcc --version
gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -mcpu=neoverse-v2 -E -dM - < /dev/null | grep -E 'ARM_FEATURE_(BTI|PAUTH)'
$
the compiler doesn't use the BTI and PAUTH features, not too much unsurprisingly, since they were introduced upstream only in GCC 13 (I don't have access to a Red Hat GCC 13 compiler to see what happens there).
This looks like quite a mess :smiling_face_with_tear:
Rocky 9 has the same issue with AWS Graviton3 (such as hpc7g instance types) - archspec identifies it as neoverse_n1.
On Amazon Linux 2023 it identifies it as neoverse_v1.
Rocky 9 default GCC 11.4.1 recognizes V1 (-mcpu=neoverse-v1).
[rocky@ip-10-11-33-103 ~]$ lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Vendor ID: ARM
Model name: Neoverse-V1
Model: 1
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 1
Stepping: r1p1
BogoMIPS: 2100.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs dcpodp
svei8mm svebf16 i8mm bf16 dgh rng
...
Note: running lscpu on Amazon Linux 2023 does not say Neoverse-v1:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Vendor ID: ARM
Model: 1
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 1
Stepping: r1p1
BogoMIPS: 2100.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng
The only differences here are: missing core model name on AL2023, and that AL2023 has paca and pacg CPU flags.
Would it therefore make sense to
(a) remove paca/pacg from the JSON for V1/V2 cores - if that's how the matching is being done?
(b) in the first instance: should it also match the model name instead if present? If there were two cores with the same feature set but different names (and differing in performance due to, say, fewer pipes), that matching would be better. Spack's current 'target' is both architectural (ISA supported instructions) and micro-architectural (core performance model) tuning - so a specific core name should trump a generic architecture flags.
remove paca/pacg from the JSON for V1/V2 cores - if that's how the matching is being done?
The problem is that a compiler may end up using those instructions. For example, with GCC 13 in RedHat 9.3 I get
$ gcc --version
gcc (GCC) 13.1.1 20230614 (Red Hat 13.1.1-4)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -mcpu=neoverse-v2 -E -dM - < /dev/null | grep -E 'ARM_FEATURE_(BTI|PAUTH)'
#define __ARM_FEATURE_BTI 1
#define __ARM_FEATURE_PAUTH 1
which would create a weird situation where the the kernel wouldn't know those extensions are available, but the compiler could potentially use their instructions. Frankly I don't know what happens in this case where the kernel doesn't know your CPU supports a certain extension, do you get a SIGILL
error or the code actually runs fine?
[..] The problem is that a compiler may end up using those instructions. For example, with GCC 13 in RedHat 9.3 I get [..] which would create a weird situation where the the kernel wouldn't know those extensions are available, but the compiler could potentially use their instructions. Frankly I don't know what happens in this case where the kernel doesn't know your CPU supports a certain extension, do you get a
SIGILL
error or the code actually runs fine?
In the case of BTI/PA these are backwards-compatible and behave as NOPs on platforms that don't support it. They're also only emitted in the presence of -mbranch-protection= flag - not turned on automatically for any particular core/arch version.
More generally, yes, it is possible to have a situation - such as an older kernel with SVE not supported but where the hardware does support it, and that would be a SIGILL if it were executed (BTI/PA are different in not doing that, as described above). I think that's an edge-case, and can be runtime detected (OpenBLAS and Arm Performance Libraries do that)
If archspec's spec is to say "What is that CPU": I think it has to say "Neoverse V1" - so CPU ID is the better place to start rather than feature list given the impact on performance tuning that feature list would have (as they are not unique and not related to performance characteristics) vs using CPU ID.
In the case of BTI/PA these are backwards-compatible and behave as NOPs on platforms that don't support it. They're also only emitted in the presence of -mbranch-protection= flag - not turned on automatically for any particular core/arch version.
@alalazo In this case it may indeed b safe to remove those features, what do you think?
This might help the discussion? https://github.com/torvalds/linux/blob/master/Documentation/arch/arm64/pointer-authentication.rst From experience using RHEL on GH, I've removed those flags and have run into less issues with builds (notably, things like the openblas "switch '-mcpu=neoverse-n1' conflicts with '-march=armv8.4-a+sve' switch" kind of errors)
In this case it may indeed b safe to remove those features, what do you think?
Getting back to this. Reading the thread, I think it is safe to remove these flags. @giordano Do you want to submit a PR for that, and report whether you see issues?
In the case of BTI/PA these are backwards-compatible and behave as NOPs on platforms that don't support it. They're also only emitted in the presence of -mbranch-protection= flag - not turned on automatically for any particular core/arch version.
@alalazo In this case it may indeed b safe to remove those features, what do you think?
Yes, paca
, pacg
, and bti
should be removed from Neoverse V2 detection. These features aren't always activated in all Linux kernels so they may not appear in /proc/cpuinfo
.
Thanks!
On an Nvidia GH200 system I get
archspec detects it as neoverse_n1, although it should be neoverse_v2. Comparing the features listed in archspec and those reported by lscpu I get
So lscpu doesn't report
paca
,pacg
, andbti
, are they really necessary?