EESSI / software-layer

Software layer of the EESSI project
https://eessi.github.io/docs/software_layer
GNU General Public License v2.0
23 stars 46 forks source link

[DO NOT MERGE] debug hatchling rebuild issue #555

Closed bedroge closed 5 months ago

bedroge commented 5 months ago

Using this PR to debug the issue observed in #546.

eessi-bot[bot] commented 5 months ago

Instance eessi-bot-mc-aws is configured to build:

eessi-bot[bot] commented 5 months ago

Instance eessi-bot-mc-azure is configured to build:

bedroge commented 5 months ago

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3

eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-aws (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - submitted job `10006`, for details & status see https://github.com/EESSI/software-layer/pull/555#issuecomment-2085100860
eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-azure (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - no jobs were submitted
eessi-bot[bot] commented 5 months ago
New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_555/10006 date job status comment
Apr 30 11:44:32 UTC 2024 submitted job id 10006 awaits release by job manager
Apr 30 11:45:08 UTC 2024 released job awaits launch by Slurm scheduler
Apr 30 11:46:10 UTC 2024 running job 10006 is running
Apr 30 11:57:25 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-10006.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Apr 30 11:57:25 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-10006.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
bedroge commented 5 months ago

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3

eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-aws (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - submitted job `10007`, for details & status see https://github.com/EESSI/software-layer/pull/555#issuecomment-2085119642
eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-azure (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - no jobs were submitted
eessi-bot[bot] commented 5 months ago
New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_555/10007 date job status comment
Apr 30 11:50:59 UTC 2024 submitted job id 10007 awaits release by job manager
Apr 30 11:51:16 UTC 2024 released job awaits launch by Slurm scheduler
Apr 30 11:56:24 UTC 2024 running job 10007 is running
Apr 30 12:08:47 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-10007.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Apr 30 12:08:47 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-10007.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
bedroge commented 5 months ago

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3

eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-aws (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - submitted job `10008`, for details & status see https://github.com/EESSI/software-layer/pull/555#issuecomment-2085138780
eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-azure (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - no jobs were submitted
eessi-bot[bot] commented 5 months ago
New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_555/10008 date job status comment
Apr 30 12:00:21 UTC 2024 submitted job id 10008 awaits release by job manager
Apr 30 12:00:30 UTC 2024 released job awaits launch by Slurm scheduler
Apr 30 12:01:34 UTC 2024 running job 10008 is running
Apr 30 12:23:23 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-10008.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1714479127.tar.gzsize: 111 MiB (116574222 bytes)
entries: 10238
modules under _2023.06/software/linux/x8664/amd/zen3/modules/all
Python/3.11.5-GCCcore-13.2.0.lua
software under _2023.06/software/linux/x8664/amd/zen3/software
Python/3.11.5-GCCcore-13.2.0
other under _2023.06/software/linux/x8664/amd/zen3
no other files in tarball
Apr 30 12:23:23 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-10008.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
bedroge commented 5 months ago

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3

eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-aws (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - submitted job `10009`, for details & status see https://github.com/EESSI/software-layer/pull/555#issuecomment-2085159696
eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-azure (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - no jobs were submitted
eessi-bot[bot] commented 5 months ago
New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_555/10009 date job status comment
Apr 30 12:08:46 UTC 2024 submitted job id 10009 awaits release by job manager
Apr 30 12:09:50 UTC 2024 released job awaits launch by Slurm scheduler
Apr 30 12:10:53 UTC 2024 running job 10009 is running
Apr 30 12:33:42 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-10009.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1714479706.tar.gzsize: 111 MiB (116582197 bytes)
entries: 10238
modules under _2023.06/software/linux/x8664/amd/zen3/modules/all
Python/3.11.5-GCCcore-13.2.0.lua
software under _2023.06/software/linux/x8664/amd/zen3/software
Python/3.11.5-GCCcore-13.2.0
other under _2023.06/software/linux/x8664/amd/zen3
no other files in tarball
Apr 30 12:33:42 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-10009.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
bedroge commented 5 months ago

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3

eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-aws (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - submitted job `10010`, for details & status see https://github.com/EESSI/software-layer/pull/555#issuecomment-2085170545
eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-azure (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - no jobs were submitted
eessi-bot[bot] commented 5 months ago
New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_555/10010 date job status comment
Apr 30 12:15:06 UTC 2024 submitted job id 10010 awaits release by job manager
Apr 30 12:16:03 UTC 2024 released job awaits launch by Slurm scheduler
Apr 30 12:22:20 UTC 2024 running job 10010 is running
Apr 30 12:44:57 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-10010.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1714480420.tar.gzsize: 111 MiB (116569691 bytes)
entries: 10238
modules under _2023.06/software/linux/x8664/amd/zen3/modules/all
Python/3.11.5-GCCcore-13.2.0.lua
software under _2023.06/software/linux/x8664/amd/zen3/software
Python/3.11.5-GCCcore-13.2.0
other under _2023.06/software/linux/x8664/amd/zen3
no other files in tarball
Apr 30 12:44:57 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-10010.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
bedroge commented 5 months ago

Manually added write permissions to /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/hatchling/1.18.0-GCCcore-13.2.0/ on the Stratum 0, let's try again :crossed_fingers:

bedroge commented 5 months ago

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3

eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-aws (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - submitted job `10013`, for details & status see https://github.com/EESSI/software-layer/pull/555#issuecomment-2085248148
eessi-bot[bot] commented 5 months ago
Updates by the bot instance eessi-bot-mc-azure (click for details) - received bot command `build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3` from `bedroge` - expanded format: `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` - handling command `build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3` resulted in: - no jobs were submitted
eessi-bot[bot] commented 5 months ago
New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_555/10013 date job status comment
Apr 30 12:51:56 UTC 2024 submitted job id 10013 awaits release by job manager
Apr 30 12:52:01 UTC 2024 released job awaits launch by Slurm scheduler
Apr 30 12:53:03 UTC 2024 running job 10013 is running
Apr 30 13:15:31 UTC 2024 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-10013.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1714482246.tar.gzsize: 111 MiB (117080049 bytes)
entries: 10590
modules under _2023.06/software/linux/x8664/amd/zen3/modules/all
hatchling/1.18.0-GCCcore-13.2.0.lua
Python/3.11.5-GCCcore-13.2.0.lua
software under _2023.06/software/linux/x8664/amd/zen3/software
hatchling/1.18.0-GCCcore-13.2.0
Python/3.11.5-GCCcore-13.2.0
other under _2023.06/software/linux/x8664/amd/zen3
no other files in tarball
Apr 30 13:15:31 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-10013.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
bedroge commented 5 months ago

I've tried several things here: purging the CVMFS cache between the removal and build steps, adding write permissions before removing the installation directories, only removing the contents of the installations directories (and not directory itself), moving instead of removing, but it didn't solve the issue. In the end, I added write permissions to the hatchling directory on the stratum 0, and that did solve the issue. So, I'll do that for all CPU targets, and then try rebuilding the apps in #546.