Closed fspiga closed 2 years ago
cc @alalazo for visibility
In this repo, the linux-amazon-graviton
and linux-amazon-graviton2
use names of products (graviton) while the equivalent in x86 use names of architectures (haswell, sandybridge, cascadelake)
If we were to use the name of Arm Core IP architectures, names should looks like
linux-amazon-graviton
-> linux-cortex-A72
(https://en.wikichip.org/wiki/annapurna_labs/alpine/al73400)linux-amazon-graviton2
-> linux-neoverse-N1
(also compatible with Ampere Computing Altra CPU)linux-amazon-graviton3
-> linux-neoverse-??
(no official announcement made by AWS AFAIK but ... https://www.nextplatform.com/2022/01/04/inside-amazons-graviton3-arm-server-processor/)Special case is A64FX since it is a custom uarch from Fujitsu. But I do not see a target files for that.
I have been revisiting this issue, to cut a release at the end of the week and port it to Spack. I think our current labeling is ok since:
graviton2
seems to expose the same feature as ampere altra
, according to the exceprts of /proc/cpuinfo
you posted (so no way to distinguish the two)neoverse-n1
doesn't have crypto
extensions by defaultWe might want to add a "label alias" later, to say that graviton2
and ampere altra
share the same features and flags.
For reference, here's what I obtain with GCC 9.4.0 using our current optimization line for graviton2
and using neoverse-n1
$ gcc -dM -E - -mcpu=neoverse-n1 < /dev/null | grep __ARM_FEATURE_
#define __ARM_FEATURE_ATOMICS 1
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_QRDMX 1
#define __ARM_FEATURE_DOTPROD 1
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_CRC32 1
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
$ gcc -dM -E - -march=armv8.2-a+fp16+rcpc+dotprod+crypto -mtune=neoverse-n1 < /dev/null | grep __ARM_FEATURE_
#define __ARM_FEATURE_ATOMICS 1
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FEATURE_AES 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_QRDMX 1
#define __ARM_FEATURE_DOTPROD 1
#define __ARM_FEATURE_CRYPTO 1
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_SHA2 1
#define __ARM_FEATURE_CRC32 1
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
Question: since crypto
does not seem to be captured by /proc/cpuinfo
, how can you differentiate one CPU from the other?
Crypto isn't really a separate feature, it's an alias for aes + sha2
Similar problem on the Azure Arm instances powered by Ampere Altra which went GA on 1 Sept 2022:
$ archspec --version
archspec, version 0.1.4
$ archspec cpu
graviton
$ grep Features /proc/cpuinfo | head -1
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
An interesting difference here is that here archspec
reports graviton
(rather than graviton2
).
It seems like that's due to sbss
not being listed in the Features
list in /proc/cpuinfo
?
I think @fspiga's argument for microarchitectures vs product names is compelling.
It's also why we use zen2
and zen3
rather than rome
and milan
(despite the latter being perhaps more intuitive).
There are also advantages between discriminating AWS Graviton and Ampere Altra though: although a binary compiled on one may run fine on the other, compiler optimizations being used may be different (depending on the exact compiler flags being used), which could impact performance. That may be beyond the scope of archspec though, unless we add support for another mode (ISA vs uarch?)
Can somebody post a cat of /proc/cpuinfo
? We can't distinguish the two by features, but we might be able to distinguish them using other fields
AWS Graviton2, Amazon Linux:
processor : 0
BogoMIPS : 243.75
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x3
CPU part : 0xd0c
CPU revision : 1
Ampere Altra (Azure), Ubuntu:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 1
NUMA node(s): 1
Vendor ID: ARM
Model: 1
Model name: Neoverse-N1
Stepping: r3p1
BogoMIPS: 50.00
L1d cache: 4 MiB
L1i cache: 4 MiB
L2 cache: 64 MiB
L3 cache: 32 MiB
NUMA node0 CPU(s): 0-63
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
@boegel Can you post /proc/cpuinfo
for Ampere Altra? If I am not wrong that seems the output of lscpu
, right?
I took the Ampere Altra bit from https://github.com/EESSI/software-layer/pull/187/files#diff-d35bee727b3c563d7a3faf81cbe1ce6fda4d1167ef18e3b6d0f4d9c622e9e413, so maybe @hmeiland can clarify whether this is indeed /proc/cpuinfo
or lscpu
?
[filippos@amp001 ~]$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 50.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x3
CPU part : 0xd0c
CPU revision : 1
[ ... ]
In this repo, the
linux-amazon-graviton
andlinux-amazon-graviton2
use names of products (graviton) while the equivalent in x86 use names of architectures (haswell, sandybridge, cascadelake)If we were to use the name of Arm Core IP architectures, names should looks like
linux-amazon-graviton
->linux-cortex-A72
(https://en.wikichip.org/wiki/annapurna_labs/alpine/al73400)linux-amazon-graviton2
->linux-neoverse-N1
(also compatible with Ampere Computing Altra CPU)linux-amazon-graviton3
->linux-neoverse-??
(no official announcement made by AWS AFAIK but ... https://www.nextplatform.com/2022/01/04/inside-amazons-graviton3-arm-server-processor/)Special case is A64FX since it is a custom uarch from Fujitsu. But I do not see a target files for that.
Adding on the list above. It is now public that AWS Graviton3 is Arm Neoverse V1 (armv8). Same core IP will be SiPearl Rhea. NVIDIA Grace is Arm Neoverse V2 (armv9).
Are there additional instructions going from armv8
to neoverse-v1
or from armv9
to neoverse-v2
? And is it just armv8
or one of the minor version bumps?
@alalazo: my recommendation here, unless we can find a way to meaningfully differentiate between neoverse-n1
and graviton2
, is to do the rename. We do not want this to be partial to any one cloud/vendor/etc, and we definitely do not want to deter Azure people from using the library.
For backward compatibility in Spack we have two options, I think:
graviton2
as a node or an alias that just doesn't get detected, but that is recognized as compatible with neoverse-n1
I think the only current (major?) consumer of the graviton*
targets is Spack and the various builds we have in AWS right now. I think we can pretty easily rebuild those as neoverse-n1
binaries and just replace the whole cache, and we can also implement something like (1) or (2) so that peoples' workflows (e.g. preferences in spack.yaml
) continue to work. I think for spack.yaml
to continue to work we'll need to do (1).
I also don't see why we can't move upstream on while we work these things out in Spack -- I think it would be good to do.
Renaming is doable. The issue here is if we should go with -mcpu=neoverse-n1
or the other optimization line we have:
-march=armv8.2-a+fp16+rcpc+dotprod+crypto -mtune=neoverse-n1
which adds also sha2
and aes
. Basically, both graviton2
and altra
are neoverse-n1 +aes +sha2
.
@OliverPerks @stephenmsachs any advice on the flag recommendation here?
Further data point, I am in a ubuntu:latest
container and trying system clang
and gcc
. It seems clang
adds aes
and sha2
by default to neoverse-n1
, while gcc
does not:
root@374eedf176f2:/# clang --version
Ubuntu clang version 14.0.0-1ubuntu1
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
root@374eedf176f2:/# clang -dM -E - -mcpu=neoverse-n1 < /dev/null | grep __ARM_FEATURE_ | sort
#define __ARM_FEATURE_AES 1
#define __ARM_FEATURE_ATOMICS 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_CRC32 1
#define __ARM_FEATURE_CRYPTO 1
#define __ARM_FEATURE_DIRECTED_ROUNDING 1
#define __ARM_FEATURE_DIV 1
#define __ARM_FEATURE_DOTPROD 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_LDREX 0xF
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
#define __ARM_FEATURE_QRDMX 1
#define __ARM_FEATURE_SHA2 1
#define __ARM_FEATURE_UNALIGNED 1
and
root@374eedf176f2:/# gcc --version
gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
root@374eedf176f2:/# gcc -dM -E - -mcpu=neoverse-n1 < /dev/null | grep __ARM_FEATURE_ | sort
#define __ARM_FEATURE_ATOMICS 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_CRC32 1
#define __ARM_FEATURE_DOTPROD 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
#define __ARM_FEATURE_QRDMX 1
#define __ARM_FEATURE_UNALIGNED 1
Hi all. @fspiga pointed me to this thread. I manage the compiler & library technical roadmaps here at Arm.
There's a long backstory here - crypto features such as AES and SHA2 aren't universally available, largely due to export constraints.
This has led to an unfortunate wrinkle that you've just fallen upon, where they're enabled by default in clang, and not in gcc.
gcc's behaviour is more correct, but it's non-trivial to change this in clang for v8 cores, as it creates a backwards compatibility headache.
For all future v9 cores (eg Neoverse N2, V2, Cortex-X2), clang should not enable this by default*
The easiest way of ensuring that clang and gcc do the same thing here is with one of the following:
-mcpu=neoverse-n1+nocrypto
-mcpu=neoverse-n1+crypto
hth,
Will Lovett
*
I say 'should' here - I've just discovered a bug in Neoverse N2, whereby it enables them. I'm going to ask the team to fix.
Summary
I am working on NVIDIA Arm HPC Developer Kit (https://developer.nvidia.com/arm-hpc-devkit) which is equipped with Ampere Computing 'Altra' CPU.
When running spack (any recent released tags and giot head), the CPU is recornised as
graviton2
despite it is not. This is what I get on an internal deployment:Spack version:
0.17.1-1338-fddc58387c
Rationale
graviton2
is not the only Arm-based CPU supporting Arm Neoverse N1 core IP. See https://www.anandtech.com/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd and https://www.anandtech.com/show/15575/amperes-altra-80-core-n1-soc-for-hyperscalers-against-rome-and-xeonSince two Arm-based CPUs are based on the same Arm Core IP, it is very hard to distinguish them based on what linux reports. Spack does nothing wrong here, it looks at "Features" listed in
/proc/cpuinfo
. However the list of supported CPU uarch features support by AWS Graviton2 and Ampere Compouting Altra are the sameGraviton2:
Ampere Computing Altra:
Description
Rename target
graviton2
toneoverse-n1
.Additional information
In practice, nothing breaks at the moment. However it can be confusing for a non-experienced user to see spack recornising "graviton2" as target when running on system with Ampere Computing Altra CPU.
General information
spack --version
and reported the version of Spack