EESSI / software-layer

Software layer of the EESSI project
https://eessi.github.io/docs/software_layer
GNU General Public License v2.0
23 stars 46 forks source link

Non-optimal CPU detection of neoverse_v1 using archspec #320

Open laraPPr opened 1 year ago

laraPPr commented 1 year ago

When setting up the EESSI environment on the neoverse_v1 nodes on aws/citc archspec detects neoverse_n1 instead of neoverse_v1.

@fair-mastodon-c7g-2xlarge-0002 ~]$ source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash

Found EESSI pilot repo @ /cvmfs/pilot.eessi-hpc.org/versions/2023.06!

archspec says aarch64/neoverse_n1

Using aarch64/neoverse_n1 as software subdirectory.

Using /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/modules/all as the directory to be added to MODULEPATH.

Found Lmod configuration file at /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/.lmod/lmodrc.lua

Initializing Lmod...

Prepending /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/modules/all to $MODULEPATH...

Environment set up to use EESSI pilot software stack, have fun!
laraPPr commented 1 year ago

cat /proc/cpuinfo on c7g-2xlarge

processor   : 0

BogoMIPS    : 2100.00

Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs dcpodp svei8mm svebf16 i8mm bf16 dgh rng

CPU implementer : 0x41

CPU architecture: 8

CPU variant : 0x1

CPU part    : 0xd40

CPU revision    : 1
laraPPr commented 1 year ago

There are two missing CPU features on c7g-2xlarge: paca and pacg You can check the list here

boegel commented 1 year ago

Together with #322, this is enough motivation to switch to using our own minimal archdetect implementation rather than relying on archspec for EESSI pilot 2023.06, I think...

ocaisa commented 1 year ago

@boegel There's an error in archdetect that is fixed as part of https://github.com/EESSI/software-layer/pull/264

boegel commented 1 year ago

@laraPPr Can you check whether archdetect correctly detects both neoverse_v1 and zen3 (cfr. #322), using:

EESSI_USE_ARCHDETECT=1 source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash
laraPPr commented 1 year ago

Results on zen3: https://github.com/EESSI/software-layer/issues/322#issuecomment-1702640074

Found EESSI pilot repo @ /cvmfs/pilot.eessi-hpc.org/versions/2023.06!

2023-09-01 12:10:56 [INFO] cpupath: best match for host CPU: aarch64/arm/neoverse-v1

archdetect says aarch64/arm/neoverse-v1

Using aarch64/arm/neoverse-v1 as software subdirectory.

ERROR: EESSI software layer at /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/arm/neoverse-v1 not found!

laraPPr commented 1 year ago

The error is because archdetect finds neoverse-v1 instead of neoverse_v1 This is also the case for neoverse_n1 which archdetect recognizes as neoverse-n1

boegel commented 1 year ago

And there's the extra arm/ subdirectory which doesn't exist.

ocaisa commented 1 year ago

As part of #264, I've fixed the Arm detection and added an additional check in CI that ensures that whatever archdetect spits out actually exists as an option.