Open casparvl opened 4 months ago
bot: build arch:aarch64/generic repo:eessi.io-2023.06-software
eessi-bot-mc-aws
(click for details)boegel-bot-deucalion
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_585/16776 |
date | job status | comment |
---|---|---|---|
Aug 21 20:00:14 UTC 2024 | submitted | job id 16776 awaits release by job manager |
|
Aug 21 20:01:13 UTC 2024 | released | job awaits launch by Slurm scheduler | |
Aug 21 20:06:16 UTC 2024 | running | job 16776 is running |
|
Aug 21 21:11:16 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
|
Aug 21 21:11:16 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3
boegel-bot-deucalion
(click for details)eessi-bot-mc-aws
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_585/16777 |
date | job status | comment |
---|---|---|---|
Aug 21 20:09:04 UTC 2024 | submitted | job id 16777 awaits release by job manager |
|
Aug 21 20:09:21 UTC 2024 | released | job awaits launch by Slurm scheduler | |
Aug 21 20:15:34 UTC 2024 | running | job 16777 is running |
|
Aug 21 21:48:14 UTC 2024 | finished | :grin: SUCCESS (click triangle for details)
|
|
Aug 21 21:48:14 UTC 2024 | test result | :cry: FAILURE (click triangle for details)
|
Hm, https://github.com/EESSI/software-layer/pull/585#issuecomment-2302912876 still fails with the same static TLS issue. I realize why though: the sanity check is run before generating the module file. It will generate a temporary module file for this step, but since the hook only gets applied when generating the final module file, it doesn't get applied here!
Fix should be relatively simple: make the hook apply earlier, as a pre_sanitycheck_hook
.
bot: build arch:aarch64/generic repo:eessi.io-2023.06-software
eessi-bot-mc-aws
(click for details)boegel-bot-deucalion
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws
for architecture aarch64-generic
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.08/pr_585/16826
== 2024-08-22 14:32:22,373 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: sanity check command spm_train --help | grep accept_language exited with code 1 (output: /bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.36' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.35' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libgcc_s.so.1)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libgcc_s.so.1)
)
sanity check command python -c 'import sentencepiece' exited with code 1 (output: /bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.36' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.35' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libgcc_s.so.1)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libgcc_s.so.1)
) (at easybuild/framework/easyblock.py:3663 in _sanity_check_step)
date | job status | comment |
---|---|---|
Aug 22 14:11:24 UTC 2024 | submitted | job id 16826 awaits release by job manager |
Aug 22 14:11:37 UTC 2024 | released | job awaits launch by Slurm scheduler |
Aug 22 14:17:39 UTC 2024 | running | job 16826 is running |
Aug 22 15:23:13 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
Aug 22 15:23:13 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
Failure of the test suite on x86_64 with:
FAILURE INFO for EESSI_PyTorch_torchvision_CPU %nn_model=resnet50 %scale=1_node %parallel_strategy=None %module_name=PyTorch-bundle/2.1.2-foss-2023a (run: 1/1)
* Description: Benchmark that runs a selected torchvision model on synthetic data
* System partition: BotBuildTests:default
* Environment: default
* Stage directory: /project/60006/SHARED/jobs/2024.08/pr_585/event_33c66470-5ff9-11ef-924c-fc9f4cfa4137/run_000/linux_x86_64_amd_zen3/eessi.io-2023.06-software/reframe_runs/stage/BotBuildTests/default/default/EESSI_PyTorch_torchvision_CPU_39d248a6
* Node list:
* Job type: local (id=None)
* Dependencies (conceptual): []
* Dependencies (actual): []
* Maintainers: []
* Failing phase: setup
* Rerun with '-n /39d248a6 -p default --system BotBuildTests:default -r'
* Reason: attribute error: EESSI-test-suite/eessi/testsuite/utils.py:163: Processor information (num_cores_per_numa_node) missing. Check that processor information is either autodetected (see https://reframe-hpc.readthedocs.io/en/stable/configure.html#proc-autodetection), or manually set in the ReFrame configuration file (see https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#processor-info).
raise AttributeError(msg)
Ok, we didn't define that in our template config file. Also, it is particular to newer versions of ReFrame. I'll create a PR that adds a new version of ReFrame and I'll create a PR that no longer uses hard-coded processor features, but autodetects them. The challenge is that with the local spawner, if we use a single config file, it doesn't have the specific partition we submitted to. But, we can get that from the job environment and inject it in the config. I'll do that in https://github.com/EESSI/software-layer/pull/682 and a new ReFrame in https://github.com/EESSI/software-layer/pull/708
Copying some findings from Slack here:
To me it seems the problem is a combination of what EasyBuild uses to run commands (it uses /bin/bash
) and that we currently set LD_PRELOAD
too early via the modified module file. Below are a few examples illustrating what happens.
The original TLS (Thread-Local Storage) allocation error... (withou LD_PRELOAD
, just running the import after loading Python, gperftools and setting PATH
and PYTHONPATH
to the build directory for SentencePiece)
bot@aarch64-generic-node3 /tmp/bot $ python -c 'import sentencepiece'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/SentencePiece/0.2.0-GCC-12.3.0/lib/python3.11/site-packages/sentencepiece/__init__.py", line 10, in <module>
from . import _sentencepiece
ImportError: /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so.4: cannot allocate memory in static TLS block
With LD_PRELOAD
this succeeds (same env otherwise)...
bot@aarch64-generic-node3 /tmp/bot $ LD_PRELOAD=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so python -c 'import sentencepiece'
However, that is not how EasyBuild runs the sanitycheck command. It rather runs the following (which fails)...
bot@aarch64-generic-node3 /tmp/bot $ LD_PRELOAD=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so /bin/bash -c "python -c 'import sentencepiece'"
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.36' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.35' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libgcc_s.so.1)
/bin/bash: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/GCCcore/12.3.0/lib64/libgcc_s.so.1)
The above error is what we got in the last build job for aarch64/generic
. If we run the original command in a subshell (as EasyBuild does), we get the original error (just to illustrate that we "correctly" emulate what EasyBuild does)...
bot@aarch64-generic-node3 /tmp/bot $ /bin/bash -c "python -c 'import sentencepiece'"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/SentencePiece/0.2.0-GCC-12.3.0/lib/python3.11/site-packages/sentencepiece/__init__.py", line 10, in <module>
from . import _sentencepiece
ImportError: /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so.4: cannot allocate memory in static TLS block
If we set LD_PRELOAD
just before we run python, it works...
bot@aarch64-generic-node3 /tmp/bot $ /bin/bash -c "LD_PRELOAD=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so python -c 'import sentencepiece'"
I think, setting LD_PRELOAD
in the module for SentencePiece could work. However, when running EasyBuild we'll likely run into issues because it uses /bin/bash
to run commands. If it would use bash from the compat layer it would work. See example below
bot@aarch64-generic-node3 /tmp/bot $ LD_PRELOAD=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so.4 /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/bin/bash -c "python -c 'import sentencepiece'"
To me it seems that /bin/bash
and /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so.4
depend on different symbols (which sounds logical), hence it is critical to only preload the latter library after /bin/bash
's dependencies have been resolved.
@trz42 Doesn't this mean that EasyBuild should be using the /bin/bash
from the compat layer, so prefixed with sysroot
in EasyBuild lingo?
@trz42 Doesn't this mean that EasyBuild should be using the
/bin/bash
from the compat layer, so prefixed withsysroot
in EasyBuild lingo?
Maybe. If sysroot
implies that it can expect a sysroot/bin/bash
it could work. However, it has only resulted in a problem when we use LD_PRELOAD
. So, maybe we should look for another solution.
I'm trying to solve the issue with a parse hook where I just add LD_PRELOAD=...
in front of the failing sanity check command and another hook to add LD_PRELOAD=...
in the module file. However, the latter has to be done after the sanity check has been run.
A better fix could be what you suggest, in some cases or always, we prefix the exec_cmd = "/bin/bash"
(/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/EasyBuild/4.9.2/lib/python3.11/site-packages/easybuild/tools/run.py:229) with sysroot
when it is present. Then we could just add the setting of LD_PRELOAD=...
in the module file and it should work both while using the module and while running the sanity check.
@trz42 Doesn't this mean that EasyBuild should be using the /bin/bash from the compat layer, so prefixed with sysroot in EasyBuild lingo?
To me, this makes a lot of sense actually. If you're explicitly invoking a shell to run your command, and if a sysroot
is set, it should be the shell from that sysroot
prefix imho.
What is the reason that EasyBuild is running this in a subshell actually? I mean that is not typically how I would test the module manually and could potentially lead to differences with running it in the parent shell (this example begin a case in point).
@casparvl All shell commands run by EasyBuild are run in a subshell...
A better fix could be what you suggest, in some cases or always, we prefix the
exec_cmd = "/bin/bash"
(/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/EasyBuild/4.9.2/lib/python3.11/site-packages/easybuild/tools/run.py:229) withsysroot
when it is present. Then we could just add the setting ofLD_PRELOAD=...
in the module file and it should work both while using the module and while running the sanity check.
I think that's the right way forward...
It's a relatively easy change to make in EasyBuild (though in some sense a breaking one, so perhaps we need to make it configurable).
We may even test this change already by copying the bash
files from the two compat layers (x86_64
and aarch64
) to some directory in the PR and then modify the launch of the containers such that the right file is bind mounted to /bin/bash
inside the container. Before we run eessi_container.sh
we can set SINGULARITY_BIND
.
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3
eessi-bot-mc-aws
(click for details)boegel-bot-deucalion
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_585/18888 |
date | job status | comment |
---|---|---|---|
Sep 17 20:57:04 UTC 2024 | submitted | job id 18888 awaits release by job manager |
|
Sep 17 20:57:37 UTC 2024 | released | job awaits launch by Slurm scheduler | |
Sep 17 21:04:40 UTC 2024 | running | job 18888 is running |
|
Sep 17 22:23:56 UTC 2024 | finished | :grin: SUCCESS (click triangle for details)
|
|
Sep 17 22:23:56 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
Ok, good that the test now works on x86_64
.
For this issue on ARM, I made a fix https://github.com/easybuilders/easybuild-framework/pull/4646 for EasyBuild framework, only to realize afterwards that the whole run_cmd
thing is completely overhauled in EasyBuild 5.0. Looking at the 5.0.X code here, I see:
# use bash as shell instead of the default /bin/sh used by subprocess.run
# (which could be dash instead of bash, like on Ubuntu, see https://wiki.ubuntu.com/DashAsBinSh)
# stick to None (default value) when not running command via a shell
if use_bash:
bash = shutil.which('bash')
_log.info(f"Path to bash that will be used to run shell commands: {bash}")
executable, shell = bash, True
else:
executable, shell = None, False
I tested a build of SentencePiece, including the LD_PRLOAD
hook:
eb --hooks $HOME/EESSI/software-layer/eb_hooks.py SentencePiece-0.2.0-GCC-12.3.0.eb --rebuild
with EasyBuild 5.0.X (from the current branch), and that worked without encountering the previous issue.
In other words, there is not much to fix, we just need to wait for EasyBuild 5.X to be released (soon, I hope :D). Or we need to reinstall 4.9.3 with a patch based on https://github.com/easybuilders/easybuild-framework/pull/4646 so we can proceed here.
Hmm, while the issue for SentencePiece is solved (this now installs succesfully), I'm getting
-- Check for working C compiler: /tmp/eb-cw54zzvr/tmprgti6_vm/rpath_wrappers/gcc_wrapper/gcc - broken
CMake Error at /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/CMake/3.26.3-GCCcore-12.3.0/share/cmake-3.26/Modules/CMakeTestCCompiler.cmake:67 (message):
The C compiler
"/tmp/eb-cw54zzvr/tmprgti6_vm/rpath_wrappers/gcc_wrapper/gcc"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: /tmp/casparvl/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/build/temp.linux-aarch64-cpython-311/CMakeFiles/CMakeScratch/TryCompile-XrjNFV
Run Build Command(s):/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Ninja/1.11.1-GCCcore-12.3.0/bin/ninja -v cmTC_64b77 && [1/2] /tmp/eb-cw54zzvr/tmprgti6_vm/rpath_wrappers/gcc_wrapper/gcc -O
2 -ftree-vectorize -mcpu=native -fno-math-errno -o CMakeFiles/cmTC_64b77.dir/testCCompiler.c.o -c /tmp/casparvl/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/build/temp.linux-aarch64-cpython-311/CMakeFiles/CMakeScra
tch/TryCompile-XrjNFV/testCCompiler.c
FAILED: CMakeFiles/cmTC_64b77.dir/testCCompiler.c.o
/tmp/eb-cw54zzvr/tmprgti6_vm/rpath_wrappers/gcc_wrapper/gcc -O2 -ftree-vectorize -mcpu=native -fno-math-errno -o CMakeFiles/cmTC_64b77.dir/testCCompiler.c.o -c /tmp/casparvl/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext
/text-0.16.2/build/temp.linux-aarch64-cpython-311/CMakeFiles/CMakeScratch/TryCompile-XrjNFV/testCCompiler.c
/bin/sh: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /home/casparvl/eessi/versions/2023.06/software/linux/aarch64/neoverse_n1/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so)
/bin/sh: /lib64/libc.so.6: version `GLIBC_2.33' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/sh: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/sh: /lib64/libc.so.6: version `GLIBC_2.36' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/sh: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/GCCcore/12.3.0/lib64/libstdc++.so.6)
/bin/sh: /lib64/libc.so.6: version `GLIBC_2.35' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/GCCcore/12.3.0/lib64/libgcc_s.so.1)
/bin/sh: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/GCCcore/12.3.0/lib64/libgcc_s.so.1)
ninja: build stopped: subcommand failed.
when it is installing torchtext
from PyTorch-Bundle
. I think the /bin/sh
here comes from the fact that some python process invokes subprocess.run()
, which uses /bin/sh
according to https://github.com/easybuilders/easybuild-framework/blob/a2550eb8fab479f517badbf45925c3cebda2880c/easybuild/tools/run.py#L450
The last part of the stack trace I'm getting:
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 343, in run
self.run_command("build")
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 131, in run
self.run_command(cmd_name)
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/casparvl/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/tools/setup_helpers/extension.py", line 46, in run
super().run()
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "/tmp/casparvl/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/tools/setup_helpers/extension.py", line 108, in build_extension
subprocess.check_call(["cmake", str(_ROOT_DIR)] + cmake_args, cwd=self.build_temp)
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
That's annoying to say the least. We can fix it, but it might require a patch to Python to alter which sh
command is used by default by subprocess.run
. Alternatively, we change the subprocess
call done by /tmp/casparvl/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/tools/setup_helpers/extension.py
. That's much smaller impact, but also a less complete fix. It means that any other software using SentencePiece and calling subprocess.run
will still run into this issue.
In other words, there is not much to fix, we just need to wait for EasyBuild 5.X to be released (soon, I hope :D). Or we need to reinstall 4.9.3 with a patch based on easybuilders/easybuild-framework#4646 so we can proceed here.
@casparvl There's an EasyBuild v4.9.4 release coming really soon (in next couple of days), because the GCC easyblock in EasyBuild v4.9.3 has a serious bug that many people will easily run into (see here), so it's worth trying to get https://github.com/easybuilders/easybuild-framework/pull/4646 merged ASAP.
That's annoying to say the least. We can fix it, but it might require a patch to Python to alter which
sh
command is used by default bysubprocess.run
. Alternatively, we change thesubprocess
call done by/tmp/casparvl/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/tools/setup_helpers/extension.py
. That's much smaller impact, but also a less complete fix. It means that any other software using SentencePiece and callingsubprocess.run
will still run into this issue.
@casparvl A patch to Python seems like the best way forward here.
We should check what Gentoo Prefix does here, since they must have run into similar issues with a hardcoded /bin/sh
?
From the sources, it seems to be equally broken in Gentoo Prefix:
$ cat /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib/python3.11/subprocess.py | grep -A 5 "/bin/sh"
>>> check_output(["/bin/sh", "-c",
... "ls -l non_existent_file ; exit 0"],
... stderr=STDOUT)
b'ls: non_existent_file: No such file or directory\n'
There is an additional optional argument, "input", allowing you to
--
# On Android the default shell is at '/system/bin/sh'.
unix_shell = ('/system/bin/sh' if
hasattr(sys, 'getandroidapilevel') else '/bin/sh')
args = [unix_shell, "-c"] + args
if executable:
args[0] = executable
if executable is None:
I confirmed that if I run a subprocess.run("sleep 5", shell=True)
with the python from the compat layer, it will use /bin/sh
to execute this command. So yes, it's just as broken in the Python in Gentoo-Prefix.
The fix should be very simple: prepend the sysroot to the path on this line in the source code. I guess this could (and should) be done at the EasyBlock level. I'll look at that later...