archspec / archspec-json

Other
20 stars 33 forks source link

Renaming AWS graviton2 target as `neoverse-n1` #41

Closed fspiga closed 1 year ago

fspiga commented 2 years ago

Summary

I am working on NVIDIA Arm HPC Developer Kit (https://developer.nvidia.com/arm-hpc-devkit) which is equipped with Ampere Computing 'Altra' CPU.

When running spack (any recent released tags and giot head), the CPU is recornised as graviton2 despite it is not. This is what I get on an internal deployment:

[filippos@amp001 ~]$ spack arch -f
linux-rocky8-graviton2
[filippos@amp001 ~]$ spack arch -b
linux-rocky8-graviton2

Spack version: 0.17.1-1338-fddc58387c

Rationale

graviton2 is not the only Arm-based CPU supporting Arm Neoverse N1 core IP. See https://www.anandtech.com/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd and https://www.anandtech.com/show/15575/amperes-altra-80-core-n1-soc-for-hyperscalers-against-rome-and-xeon

Since two Arm-based CPUs are based on the same Arm Core IP, it is very hard to distinguish them based on what linux reports. Spack does nothing wrong here, it looks at "Features" listed in /proc/cpuinfo. However the list of supported CPU uarch features support by AWS Graviton2 and Ampere Compouting Altra are the same

Graviton2:

$ cat /proc/cpuinfo | grep "Features"
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

Ampere Computing Altra:

$ cat /proc/cpuinfo | grep "Features"
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

Description

Rename target graviton2 to neoverse-n1.

Additional information

In practice, nothing breaks at the moment. However it can be confusing for a non-experienced user to see spack recornising "graviton2" as target when running on system with Ampere Computing Altra CPU.

General information

fspiga commented 2 years ago

cc @alalazo for visibility

fspiga commented 2 years ago

In this repo, the linux-amazon-graviton and linux-amazon-graviton2 use names of products (graviton) while the equivalent in x86 use names of architectures (haswell, sandybridge, cascadelake)

If we were to use the name of Arm Core IP architectures, names should looks like

Special case is A64FX since it is a custom uarch from Fujitsu. But I do not see a target files for that.

alalazo commented 2 years ago

I have been revisiting this issue, to cut a release at the end of the week and port it to Spack. I think our current labeling is ok since:

We might want to add a "label alias" later, to say that graviton2 and ampere altra share the same features and flags.

For reference, here's what I obtain with GCC 9.4.0 using our current optimization line for graviton2 and using neoverse-n1

$ gcc -dM -E - -mcpu=neoverse-n1 < /dev/null | grep __ARM_FEATURE_
#define __ARM_FEATURE_ATOMICS 1
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_QRDMX 1
#define __ARM_FEATURE_DOTPROD 1
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_CRC32 1
#define __ARM_FEATURE_NUMERIC_MAXMIN 1

$ gcc -dM -E - -march=armv8.2-a+fp16+rcpc+dotprod+crypto -mtune=neoverse-n1 < /dev/null | grep __ARM_FEATURE_
#define __ARM_FEATURE_ATOMICS 1
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FEATURE_AES 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_QRDMX 1
#define __ARM_FEATURE_DOTPROD 1
#define __ARM_FEATURE_CRYPTO 1
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_SHA2 1
#define __ARM_FEATURE_CRC32 1
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
fspiga commented 2 years ago

Question: since crypto does not seem to be captured by /proc/cpuinfo, how can you differentiate one CPU from the other?

giordano commented 2 years ago

Crypto isn't really a separate feature, it's an alias for aes + sha2

boegel commented 2 years ago

Similar problem on the Azure Arm instances powered by Ampere Altra which went GA on 1 Sept 2022:

$ archspec --version
archspec, version 0.1.4
$ archspec cpu
graviton
$ grep Features /proc/cpuinfo | head -1
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp

An interesting difference here is that here archspec reports graviton (rather than graviton2). It seems like that's due to sbss not being listed in the Features list in /proc/cpuinfo?

I think @fspiga's argument for microarchitectures vs product names is compelling. It's also why we use zen2 and zen3 rather than rome and milan (despite the latter being perhaps more intuitive).

There are also advantages between discriminating AWS Graviton and Ampere Altra though: although a binary compiled on one may run fine on the other, compiler optimizations being used may be different (depending on the exact compiler flags being used), which could impact performance. That may be beyond the scope of archspec though, unless we add support for another mode (ISA vs uarch?)

alalazo commented 1 year ago

Can somebody post a cat of /proc/cpuinfo? We can't distinguish the two by features, but we might be able to distinguish them using other fields

boegel commented 1 year ago

AWS Graviton2, Amazon Linux:

processor       : 0
BogoMIPS        : 243.75
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1

Ampere Altra (Azure), Ubuntu:

Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              1
Core(s) per socket:              64
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       ARM
Model:                           1
Model name:                      Neoverse-N1
Stepping:                        r3p1
BogoMIPS:                        50.00
L1d cache:                       4 MiB
L1i cache:                       4 MiB
L2 cache:                        64 MiB
L3 cache:                        32 MiB
NUMA node0 CPU(s):               0-63
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Mmio stale data:   Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; CSV2, BHB
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
alalazo commented 1 year ago

@boegel Can you post /proc/cpuinfo for Ampere Altra? If I am not wrong that seems the output of lscpu, right?

boegel commented 1 year ago

I took the Ampere Altra bit from https://github.com/EESSI/software-layer/pull/187/files#diff-d35bee727b3c563d7a3faf81cbe1ce6fda4d1167ef18e3b6d0f4d9c622e9e413, so maybe @hmeiland can clarify whether this is indeed /proc/cpuinfo or lscpu?

fspiga commented 1 year ago
[filippos@amp001 ~]$ cat /proc/cpuinfo
processor   : 0
BogoMIPS    : 50.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x3
CPU part    : 0xd0c
CPU revision    : 1
[ ... ]
fspiga commented 1 year ago

In this repo, the linux-amazon-graviton and linux-amazon-graviton2 use names of products (graviton) while the equivalent in x86 use names of architectures (haswell, sandybridge, cascadelake)

If we were to use the name of Arm Core IP architectures, names should looks like

Special case is A64FX since it is a custom uarch from Fujitsu. But I do not see a target files for that.

Adding on the list above. It is now public that AWS Graviton3 is Arm Neoverse V1 (armv8). Same core IP will be SiPearl Rhea. NVIDIA Grace is Arm Neoverse V2 (armv9).

tgamblin commented 1 year ago

Are there additional instructions going from armv8 to neoverse-v1 or from armv9 to neoverse-v2? And is it just armv8 or one of the minor version bumps?

tgamblin commented 1 year ago

@alalazo: my recommendation here, unless we can find a way to meaningfully differentiate between neoverse-n1 and graviton2, is to do the rename. We do not want this to be partial to any one cloud/vendor/etc, and we definitely do not want to deter Azure people from using the library.

For backward compatibility in Spack we have two options, I think:

  1. Keep graviton2 as a node or an alias that just doesn't get detected, but that is recognized as compatible with neoverse-n1
  2. Don't bother keeping it as a node and use the features directly. We keep all features on Spack targets right now so that we can do this evaluation after the fact.

I think the only current (major?) consumer of the graviton* targets is Spack and the various builds we have in AWS right now. I think we can pretty easily rebuild those as neoverse-n1 binaries and just replace the whole cache, and we can also implement something like (1) or (2) so that peoples' workflows (e.g. preferences in spack.yaml) continue to work. I think for spack.yaml to continue to work we'll need to do (1).

I also don't see why we can't move upstream on while we work these things out in Spack -- I think it would be good to do.

alalazo commented 1 year ago

Renaming is doable. The issue here is if we should go with -mcpu=neoverse-n1 or the other optimization line we have:

-march=armv8.2-a+fp16+rcpc+dotprod+crypto -mtune=neoverse-n1

which adds also sha2 and aes. Basically, both graviton2 and altra are neoverse-n1 +aes +sha2.

tgamblin commented 1 year ago

@OliverPerks @stephenmsachs any advice on the flag recommendation here?

alalazo commented 1 year ago

Further data point, I am in a ubuntu:latest container and trying system clang and gcc. It seems clang adds aes and sha2 by default to neoverse-n1, while gcc does not:

root@374eedf176f2:/# clang --version
Ubuntu clang version 14.0.0-1ubuntu1
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
root@374eedf176f2:/# clang -dM -E - -mcpu=neoverse-n1 < /dev/null | grep __ARM_FEATURE_ | sort
#define __ARM_FEATURE_AES 1
#define __ARM_FEATURE_ATOMICS 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_CRC32 1
#define __ARM_FEATURE_CRYPTO 1
#define __ARM_FEATURE_DIRECTED_ROUNDING 1
#define __ARM_FEATURE_DIV 1
#define __ARM_FEATURE_DOTPROD 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_LDREX 0xF
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
#define __ARM_FEATURE_QRDMX 1
#define __ARM_FEATURE_SHA2 1
#define __ARM_FEATURE_UNALIGNED 1

and

root@374eedf176f2:/# gcc --version
gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

root@374eedf176f2:/# gcc -dM -E - -mcpu=neoverse-n1 < /dev/null | grep __ARM_FEATURE_ | sort
#define __ARM_FEATURE_ATOMICS 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_CRC32 1
#define __ARM_FEATURE_DOTPROD 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
#define __ARM_FEATURE_QRDMX 1
#define __ARM_FEATURE_UNALIGNED 1
willlovett-arm commented 1 year ago

Hi all. @fspiga pointed me to this thread. I manage the compiler & library technical roadmaps here at Arm.

There's a long backstory here - crypto features such as AES and SHA2 aren't universally available, largely due to export constraints.

This has led to an unfortunate wrinkle that you've just fallen upon, where they're enabled by default in clang, and not in gcc.

gcc's behaviour is more correct, but it's non-trivial to change this in clang for v8 cores, as it creates a backwards compatibility headache.

For all future v9 cores (eg Neoverse N2, V2, Cortex-X2), clang should not enable this by default*

The easiest way of ensuring that clang and gcc do the same thing here is with one of the following:

hth,

Will Lovett

* I say 'should' here - I've just discovered a bug in Neoverse N2, whereby it enables them. I'm going to ask the team to fix.