archspec / archspec-json

Other
20 stars 32 forks source link

Nvidia Grace CPU detected as Neoverse N1 instead of Neoverse V2 #83

Closed giordano closed 5 months ago

giordano commented 7 months ago

On an Nvidia GH200 system I get

$ spack arch                                                                                                
linux-rhel9-neoverse_n1                                                                                                               
$ lscpu                                                                                                     
Architecture:           aarch64                                                                                                       
  CPU op-mode(s):       64-bit                                                                                                        
  Byte Order:           Little Endian                                                                                                 
CPU(s):                 72                                                                                                            
  On-line CPU(s) list:  0-71
Vendor ID:              ARM
  Model name:           Neoverse-V2
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 72
    Socket(s):          1
    Stepping:           r0p0
    Frequency boost:    disabled
    CPU max MHz:        3474.0000 
    CPU min MHz:        81.0000
    BogoMIPS:           2000.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3
                         sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3
                         svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh
Caches (sum of all):    
  L1d:                  4.5 MiB (72 instances)
  L1i:                  4.5 MiB (72 instances)
  L2:                   72 MiB (72 instances)
  L3:                   114 MiB (1 instance)
NUMA:                   
  NUMA node(s):         9
  NUMA node0 CPU(s):    0-71
  NUMA node1 CPU(s):    
  NUMA node2 CPU(s):    
  NUMA node3 CPU(s):    
  NUMA node4 CPU(s):    
  NUMA node5 CPU(s):    
  NUMA node6 CPU(s):    
  NUMA node7 CPU(s):    
  NUMA node8 CPU(s):    
Vulnerabilities:        
  Gather data sampling: Not affected
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Mmio stale data:      Not affected
  Retbleed:             Not affected
  Spec rstack overflow: Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Not affected
  Srbds:                Not affected
  Tsx async abort:      Not affected

archspec detects it as neoverse_n1, although it should be neoverse_v2. Comparing the features listed in archspec and those reported by lscpu I get

julia> archspec = [
                 "fp",
                 "asimd",
                 "evtstrm",
                 "aes",
                 "pmull",
                 "sha1",
                 "sha2",
                 "crc32",
                 "atomics",
                 "fphp",
                 "asimdhp",
                 "cpuid",
                 "asimdrdm",
                 "jscvt",
                 "fcma",
                 "lrcpc",
                 "dcpop",
                 "sha3",
                 "sm3",
                 "sm4",
                 "asimddp",
                 "sha512",
                 "sve",
                 "asimdfhm",
                 "dit",
                 "uscat",
                 "ilrcpc",
                 "flagm",
                 "ssbs",
                 "sb",
                 "paca",
                 "pacg",
                 "dcpodp",
                 "sve2",
                 "sveaes",
                 "svepmull",
                 "svebitperm",
                 "svesha3",
                 "svesm4",
                 "flagm2",
                 "frint",
                 "svei8mm",
                 "svebf16",
                 "i8mm",
                 "bf16",
                 "dgh",
                 "bti"
             ];

julia> lscpu = split("fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh", ' ');

julia> setdiff(archspec, lscpu)
3-element Vector{AbstractString}:
 "paca"
 "pacg"
 "bti"

So lscpu doesn't report paca, pacg, and bti, are they really necessary?

alalazo commented 7 months ago

Can you also post a sample of /proc/cpuinfo ?

giordano commented 7 months ago
processor       : 0
BogoMIPS        : 2000.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd4f
CPU revision    : 0
alalazo commented 7 months ago

So lscpu doesn't report paca, pacg, and bti, are they really necessary?

@fspiga Do you have a quick answer to this? Otherwise I can try to have a look at the those features later

giordano commented 7 months ago

Looking at LLVM, they don't seem to expect the paca, pacg, and bti features in Neoverse V2, as far as I understand: https://github.com/llvm/llvm-project/blob/baba0a4cb43181a78881fce683e3a5016daa8ce6/llvm/lib/Target/AArch64/AArch64.td#L1508-L1511

  list<SubtargetFeature> NeoverseV2 = [HasV9_0aOps, FeatureBF16, FeatureSPE,
                                       FeaturePerfMon, FeatureETE, FeatureMatMulInt8,
                                       FeatureNEON, FeatureSVE2BitPerm, FeatureFP16FML,
                                       FeatureMTE, FeatureRandGen];
giordano commented 7 months ago

Ok, I tried to do some more investigation, the llvm code snippet I shared above wasn't very useful because the features were hidden behind HasV9_0aOps, trying to query list of features from the compiler gives a different answer:

% julia --compile=min -e 'using BinaryBuilderBase; BinaryBuilderBase.runshell(Platform("aarch64", "linux"); preferred_gcc_version=v"13", lock_microarchitecture=false)'
sandbox:${WORKSPACE} # gcc --version
aarch64-linux-gnu-gcc (GCC) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

sandbox:${WORKSPACE} # gcc -mcpu=neoverse-v2 -E -dM - < /dev/null | grep -E '__ARM_FEATURE_(BTI|PAUTH)'
#define __ARM_FEATURE_BTI 1
#define __ARM_FEATURE_PAUTH 1
sandbox:${WORKSPACE} # clang --version
clang version 16.0.6 (/home/gbaraldi/.julia/dev/BinaryBuilderBase/deps/downloads/clones/llvm-project.git-1df819a03ecf6890e3787b27bfd4f160aeeeeacd50a98d003be8b0893f11a9be 7cbf1a2591520c2491aa35339f227775f4d3adf6)
Target: arm64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/x86_64-linux-musl/bin
sandbox:${WORKSPACE} # clang -mcpu=neoverse-v2 -E -dM - < /dev/null | grep -E '__ARM_FEATURE_(BTI|PAUTH)'
#define __ARM_FEATURE_BTI 1
#define __ARM_FEATURE_PAUTH 1

This is a bit worrying, I fear either Nvidia implemented a Neoverse V2 chip without these features, or the operating system can't report them correctly. Either way, compilers seem to make use of these extensions when targeting neoverse-v2 :confused: Spack generating suboptimal code for this chip would be not great. For the record, I'm using Red Hat 9

$ uname -a
Linux locust.rc.ucl.ac.uk 5.14.0-362.13.1.el9_3.aarch64+64k #1 SMP PREEMPT_DYNAMIC Fri Nov 24 03:48:25 EST 2023 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 9.3 (Plow)
giordano commented 7 months ago

More diving into this rabbit hole:

This looks like quite a mess :smiling_face_with_tear:

dslarm commented 6 months ago

Rocky 9 has the same issue with AWS Graviton3 (such as hpc7g instance types) - archspec identifies it as neoverse_n1.

On Amazon Linux 2023 it identifies it as neoverse_v1.

Rocky 9 default GCC 11.4.1 recognizes V1 (-mcpu=neoverse-v1).

[rocky@ip-10-11-33-103 ~]$ lscpu 
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 64
  On-line CPU(s) list:  0-63
Vendor ID:              ARM
  Model name:           Neoverse-V1
    Model:              1
    Thread(s) per core: 1
    Core(s) per socket: 64
    Socket(s):          1
    Stepping:           r1p1
    BogoMIPS:           2100.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs dcpodp 
                        svei8mm svebf16 i8mm bf16 dgh rng
...

Note: running lscpu on Amazon Linux 2023 does not say Neoverse-v1:

Architecture:                       aarch64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
CPU(s):                             64
On-line CPU(s) list:                0-63
Vendor ID:                          ARM
Model:                              1
Thread(s) per core:                 1
Core(s) per socket:                 64
Socket(s):                          1
Stepping:                           r1p1
BogoMIPS:                           2100.00
Flags:                              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng

The only differences here are: missing core model name on AL2023, and that AL2023 has paca and pacg CPU flags.

Would it therefore make sense to (a) remove paca/pacg from the JSON for V1/V2 cores - if that's how the matching is being done?
(b) in the first instance: should it also match the model name instead if present? If there were two cores with the same feature set but different names (and differing in performance due to, say, fewer pipes), that matching would be better. Spack's current 'target' is both architectural (ISA supported instructions) and micro-architectural (core performance model) tuning - so a specific core name should trump a generic architecture flags.

giordano commented 6 months ago

remove paca/pacg from the JSON for V1/V2 cores - if that's how the matching is being done?

The problem is that a compiler may end up using those instructions. For example, with GCC 13 in RedHat 9.3 I get

$ gcc --version
gcc (GCC) 13.1.1 20230614 (Red Hat 13.1.1-4)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -mcpu=neoverse-v2 -E -dM - < /dev/null | grep -E 'ARM_FEATURE_(BTI|PAUTH)'
#define __ARM_FEATURE_BTI 1
#define __ARM_FEATURE_PAUTH 1

which would create a weird situation where the the kernel wouldn't know those extensions are available, but the compiler could potentially use their instructions. Frankly I don't know what happens in this case where the kernel doesn't know your CPU supports a certain extension, do you get a SIGILL error or the code actually runs fine?

dslarm commented 6 months ago

[..] The problem is that a compiler may end up using those instructions. For example, with GCC 13 in RedHat 9.3 I get [..] which would create a weird situation where the the kernel wouldn't know those extensions are available, but the compiler could potentially use their instructions. Frankly I don't know what happens in this case where the kernel doesn't know your CPU supports a certain extension, do you get a SIGILL error or the code actually runs fine?

In the case of BTI/PA these are backwards-compatible and behave as NOPs on platforms that don't support it. They're also only emitted in the presence of -mbranch-protection= flag - not turned on automatically for any particular core/arch version.

More generally, yes, it is possible to have a situation - such as an older kernel with SVE not supported but where the hardware does support it, and that would be a SIGILL if it were executed (BTI/PA are different in not doing that, as described above). I think that's an edge-case, and can be runtime detected (OpenBLAS and Arm Performance Libraries do that)

If archspec's spec is to say "What is that CPU": I think it has to say "Neoverse V1" - so CPU ID is the better place to start rather than feature list given the impact on performance tuning that feature list would have (as they are not unique and not related to performance characteristics) vs using CPU ID.

giordano commented 6 months ago

In the case of BTI/PA these are backwards-compatible and behave as NOPs on platforms that don't support it. They're also only emitted in the presence of -mbranch-protection= flag - not turned on automatically for any particular core/arch version.

@alalazo In this case it may indeed b safe to remove those features, what do you think?

aweits commented 5 months ago

This might help the discussion? https://github.com/torvalds/linux/blob/master/Documentation/arch/arm64/pointer-authentication.rst From experience using RHEL on GH, I've removed those flags and have run into less issues with builds (notably, things like the openblas "switch '-mcpu=neoverse-n1' conflicts with '-march=armv8.4-a+sve' switch" kind of errors)

alalazo commented 5 months ago

In this case it may indeed b safe to remove those features, what do you think?

Getting back to this. Reading the thread, I think it is safe to remove these flags. @giordano Do you want to submit a PR for that, and report whether you see issues?

jlinford commented 5 months ago

In the case of BTI/PA these are backwards-compatible and behave as NOPs on platforms that don't support it. They're also only emitted in the presence of -mbranch-protection= flag - not turned on automatically for any particular core/arch version.

@alalazo In this case it may indeed b safe to remove those features, what do you think?

Yes, paca, pacg, and bti should be removed from Neoverse V2 detection. These features aren't always activated in all Linux kernels so they may not appear in /proc/cpuinfo.

Thanks!