EESSI / software-layer

Software layer of the EESSI project
https://eessi.github.io/docs/software_layer
GNU General Public License v2.0
20 stars 43 forks source link

stick to `x86_64/amd/zen3` when AMD Genoa (Zen4) is detected, until optimized software installations are available for Zen4 #569

Closed boegel closed 1 month ago

boegel commented 1 month ago

Tested, works like a charm (extra text is in yellow):

Found EESSI repo @ /cvmfs/software.eessi.io/versions/2023.06!
archdetect says x86_64/amd/zen4
Sticking to x86_64/amd/zen3 for now, since optimized installations for AMD Genoa (Zen4) are a work in progress, see https://gitlab.com/eessi/support/-/issues/37 for more information
Using x86_64/amd/zen3 as software subdirectory.
...
$ echo $MODULEPATH
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/modules/all
eessi-bot-aws[bot] commented 1 month ago

Instance eessi-bot-mc-aws is configured to build:

eessi-bot-aws[bot] commented 1 month ago

Instance eessi-bot-mc-azure is configured to build:

casparvl commented 1 month ago

bot: build repo:eessi.io-2023.06-software arch:aarch64/generic

eessi-bot-aws[bot] commented 1 month ago
Updates by the bot instance eessi-bot-mc-aws (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:aarch64/generic` from `casparvl` - expanded format: `build repository:eessi.io-2023.06-software architecture:aarch64/generic` - handling command `build repository:eessi.io-2023.06-software architecture:aarch64/generic` resulted in: - submitted job `10314`, for details & status see https://github.com/EESSI/software-layer/pull/569#issuecomment-2098862060
eessi-bot-aws[bot] commented 1 month ago
Updates by the bot instance eessi-bot-mc-azure (click for details) - account `casparvl` has NO permission to send commands to the bot
eessi-bot-aws[bot] commented 1 month ago
New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_569/10314 date job status comment
May 07 16:32:23 UTC 2024 submitted job id 10314 awaits release by job manager
May 07 16:33:21 UTC 2024 released job awaits launch by Slurm scheduler
May 07 16:38:23 UTC 2024 running job 10314 is running
May 07 16:50:35 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-10314.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
May 07 16:50:35 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-10314.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
boegel commented 1 month ago

@bedroge Deploying this via the bot should work, right?

casparvl commented 1 month ago

It should. I think we only need to 'build' for one architecture, since it isn't architecture specific anyway, right? (Or do we have checks in place that prevent deploying if not all architectures have a tarball?)

Anyway, I started one 'build'. If more are needed, feel free. I'm afraid I have to go now, maybe someone alse can check & deploy later tonight...

boegel commented 1 month ago

bot: build repo:eessi.io-2023.06-software arch:aarch64/generic

eessi-bot-aws[bot] commented 1 month ago
Updates by the bot instance eessi-bot-mc-aws (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:aarch64/generic` from `boegel` - expanded format: `build repository:eessi.io-2023.06-software architecture:aarch64/generic` - handling command `build repository:eessi.io-2023.06-software architecture:aarch64/generic` resulted in: - submitted job `10315`, for details & status see https://github.com/EESSI/software-layer/pull/569#issuecomment-2098956623
eessi-bot-aws[bot] commented 1 month ago
Updates by the bot instance eessi-bot-mc-azure (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:aarch64/generic` from `boegel` - expanded format: `build repository:eessi.io-2023.06-software architecture:aarch64/generic` - handling command `build repository:eessi.io-2023.06-software architecture:aarch64/generic` resulted in: - no jobs were submitted
eessi-bot-aws[bot] commented 1 month ago
New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_569/10315 date job status comment
May 07 17:28:52 UTC 2024 submitted job id 10315 awaits release by job manager
May 07 17:29:41 UTC 2024 released job awaits launch by Slurm scheduler
May 07 17:30:43 UTC 2024 running job 10315 is running
May 07 17:43:48 UTC 2024 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-10315.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1715102988.tar.gzsize: 0 MiB (2679 bytes)
entries: 3
modules under 2023.06/software/linux/aarch64/generic/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/generic/software
no software packages in tarball
other under 2023.06/software/linux/aarch64/generic
2023.06/init/bash
2023.06/init/eessi_environment_variables
2023.06/init/Magic_Castle/bash
May 07 17:43:48 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-10315.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
boegel commented 1 month ago

We should first complete deploy & merge of https://github.com/EESSI/software-layer/pull/371

trz42 commented 1 month ago

Rebuilding after #371 has been merged and this PR has been updated

bot: build repo:eessi.io-2023.06-software arch:aarch64/generic

eessi-bot-aws[bot] commented 1 month ago
Updates by the bot instance eessi-bot-mc-aws (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:aarch64/generic` from `trz42` - expanded format: `build repository:eessi.io-2023.06-software architecture:aarch64/generic` - handling command `build repository:eessi.io-2023.06-software architecture:aarch64/generic` resulted in: - submitted job `10332`, for details & status see https://github.com/EESSI/software-layer/pull/569#issuecomment-2099167100
eessi-bot-aws[bot] commented 1 month ago
Updates by the bot instance eessi-bot-mc-azure (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:aarch64/generic` from `trz42` - expanded format: `build repository:eessi.io-2023.06-software architecture:aarch64/generic` - handling command `build repository:eessi.io-2023.06-software architecture:aarch64/generic` resulted in: - no jobs were submitted
eessi-bot-aws[bot] commented 1 month ago
New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_569/10332 date job status comment
May 07 19:40:01 UTC 2024 submitted job id 10332 awaits release by job manager
May 07 19:40:33 UTC 2024 released job awaits launch by Slurm scheduler
May 07 19:45:36 UTC 2024 running job 10332 is running
May 07 19:57:48 UTC 2024 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-10332.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1715111089.tar.gzsize: 0 MiB (1877 bytes)
entries: 1
modules under 2023.06/software/linux/aarch64/generic/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/generic/software
no software packages in tarball
other under 2023.06/software/linux/aarch64/generic
2023.06/init/eessi_environment_variables
May 07 19:57:48 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-10332.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
May 07 20:36:45 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-aarch64-generic-1715111089.tar.gz to S3 bucket succeeded
boegel commented 1 month ago

@trz42 Looks good now, ready to deploy?