Open trz42 opened 3 months ago
Instance eessi-bot-mc-aws
is configured to build:
x86_64/generic
for repo eessi-hpc.org-2023.06-compat
x86_64/generic
for repo eessi-hpc.org-2023.06-software
x86_64/generic
for repo eessi.io-2023.06-compat
x86_64/generic
for repo eessi.io-2023.06-software
x86_64/intel/haswell
for repo eessi-hpc.org-2023.06-compat
x86_64/intel/haswell
for repo eessi-hpc.org-2023.06-software
x86_64/intel/haswell
for repo eessi.io-2023.06-compat
x86_64/intel/haswell
for repo eessi.io-2023.06-software
x86_64/intel/skylake_avx512
for repo eessi-hpc.org-2023.06-compat
x86_64/intel/skylake_avx512
for repo eessi-hpc.org-2023.06-software
x86_64/intel/skylake_avx512
for repo eessi.io-2023.06-compat
x86_64/intel/skylake_avx512
for repo eessi.io-2023.06-software
x86_64/amd/zen2
for repo eessi-hpc.org-2023.06-compat
x86_64/amd/zen2
for repo eessi-hpc.org-2023.06-software
x86_64/amd/zen2
for repo eessi.io-2023.06-compat
x86_64/amd/zen2
for repo eessi.io-2023.06-software
x86_64/amd/zen3
for repo eessi-hpc.org-2023.06-compat
x86_64/amd/zen3
for repo eessi-hpc.org-2023.06-software
x86_64/amd/zen3
for repo eessi.io-2023.06-compat
x86_64/amd/zen3
for repo eessi.io-2023.06-software
aarch64/generic
for repo eessi-hpc.org-2023.06-compat
aarch64/generic
for repo eessi-hpc.org-2023.06-software
aarch64/generic
for repo eessi.io-2023.06-compat
aarch64/generic
for repo eessi.io-2023.06-software
aarch64/neoverse_n1
for repo eessi-hpc.org-2023.06-compat
aarch64/neoverse_n1
for repo eessi-hpc.org-2023.06-software
aarch64/neoverse_n1
for repo eessi.io-2023.06-compat
aarch64/neoverse_n1
for repo eessi.io-2023.06-software
aarch64/neoverse_v1
for repo eessi-hpc.org-2023.06-compat
aarch64/neoverse_v1
for repo eessi-hpc.org-2023.06-software
aarch64/neoverse_v1
for repo eessi.io-2023.06-compat
aarch64/neoverse_v1
for repo eessi.io-2023.06-software
Instance eessi-bot-mc-azure
is configured to build:
x86_64/amd/zen4
for repo eessi-hpc.org-2023.06-compat
x86_64/amd/zen4
for repo eessi-hpc.org-2023.06-software
x86_64/amd/zen4
for repo eessi.io-2023.06-compat
x86_64/amd/zen4
for repo eessi.io-2023.06-software
Initially we'll build only for zen2
and aarch64/generic
...
bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software bot: build arch:aarch64/generic repo:eessi.io-2023.06-software
eessi-bot-mc-aws
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws
for architecture x86_64-amd-zen2
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12607
librosa/0.10.1-foss-2023a
when running python -c "import soundfile"
with the log messages
== 2024-06-12 12:00:43,829 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: extensions sanity check failed for 1 extensions: soundfile
failing sanity check for 'soundfile' extension: command "python -c "import soundfile"" failed; output:
Traceback (most recent call last):
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 161, in <module>
import _soundfile_data # ImportError if this doesn't exist
^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named '_soundfile_data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 171, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "
- to work around this error we need a custom `ctypes`
|date|job status|comment|
|----------|----------|------------------------|
|Jun 12 11:27:18 UTC 2024|submitted|job id `12607` awaits release by job manager|
|Jun 12 11:28:21 UTC 2024|released|job awaits launch by Slurm scheduler|
|Jun 12 11:35:26 UTC 2024|running|job `12607` is running|
|Jun 12 12:08:26 UTC 2024|finished|<details><summary>:cry: FAILURE _(click triangle for details)_</summary><dl><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-12607.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:x: found message matching <code>FAILED: </code><br/>:x: found message matching <code> required modules missing:</code><br/>:x: no message matching <code>No missing installations</code><br/>:white_check_mark: found message matching <code>\.tar\.gz created!</code><br/></dd><dt>_Artefacts_</dt><dd><details><summary><code>eessi-2023.06-software-linux-x86_64-amd-zen2-1718193717.tar.gz</code></summary>size: 162 MiB (170635688 bytes)<br/>entries: 6322<br/>modules under _2023.06/software/linux/x86_64/amd/zen2/modules/all_<br/><pre><code>imageio/2.33.1-gfbf-2023a.lua</code><br/><code>LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua</code><br/><code>NLTK/3.8.1-foss-2023a.lua</code><br/><code>numba/0.58.1-foss-2023a.lua</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0.lua</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0.lua</code><br/><code>scikit-image/0.22.0-foss-2023a.lua</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0.lua</code><br/></pre>software under _2023.06/software/linux/x86_64/amd/zen2/software_<br/><pre><code>imageio/2.33.1-gfbf-2023a</code><br/><code>LLVM/14.0.6-GCCcore-12.3.0-llvmlite</code><br/><code>NLTK/3.8.1-foss-2023a</code><br/><code>numba/0.58.1-foss-2023a</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0</code><br/><code>scikit-image/0.22.0-foss-2023a</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0</code><br/></pre>other under _2023.06/software/linux/x86_64/amd/zen2_<br/><pre><code>2023.06/init/easybuild/eb_hooks.py</code><br/></pre></details></dd></dl></details>|
|Jun 12 12:08:26 UTC 2024|test result|<details><summary>:cry: FAILURE _(click triangle for details)_</summary><dl><dt>_Reason_</dt><dd>EESSI test suite produced failures.</dd><dt>_ReFrame Summary_</dt><dd>[ FAILED ] Ran 12/12 test case(s) from 12 check(s) (2 failure(s), 0 skipped, 0 aborted)</dd><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-12607.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:x: found message matching <code>\[\s\*FAILED\s\*\].\*Ran .\* test case</code><br/></dd></dl></details>|
New job on instance eessi-bot-mc-aws
for architecture aarch64-generic
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12608
librosa/0.10.1-foss-2023a
when running python -c "import soundfile"
with the log messages
== 2024-06-12 11:55:32,669 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: extensions sanity check failed for 1 extensions: soundfile
failing sanity check for 'soundfile' extension: command "python -c "import soundfile"" failed; output:
Traceback (most recent call last):
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 161, in <module>
import _soundfile_data # ImportError if this doesn't exist
^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named '_soundfile_data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 171, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "
- to work around this error we need a custom `ctypes`
|date|job status|comment|
|----------|----------|------------------------|
|Jun 12 11:27:22 UTC 2024|submitted|job id `12608` awaits release by job manager|
|Jun 12 11:28:19 UTC 2024|released|job awaits launch by Slurm scheduler|
|Jun 12 11:34:23 UTC 2024|running|job `12608` is running|
|Jun 12 12:04:20 UTC 2024|finished|<details><summary>:cry: FAILURE _(click triangle for details)_</summary><dl><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-12608.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:x: found message matching <code>FAILED: </code><br/>:x: found message matching <code> required modules missing:</code><br/>:x: no message matching <code>No missing installations</code><br/>:white_check_mark: found message matching <code>\.tar\.gz created!</code><br/></dd><dt>_Artefacts_</dt><dd><details><summary><code>eessi-2023.06-software-linux-aarch64-generic-1718193401.tar.gz</code></summary>size: 152 MiB (160274969 bytes)<br/>entries: 6322<br/>modules under _2023.06/software/linux/aarch64/generic/modules/all_<br/><pre><code>imageio/2.33.1-gfbf-2023a.lua</code><br/><code>LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua</code><br/><code>NLTK/3.8.1-foss-2023a.lua</code><br/><code>numba/0.58.1-foss-2023a.lua</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0.lua</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0.lua</code><br/><code>scikit-image/0.22.0-foss-2023a.lua</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0.lua</code><br/></pre>software under _2023.06/software/linux/aarch64/generic/software_<br/><pre><code>imageio/2.33.1-gfbf-2023a</code><br/><code>LLVM/14.0.6-GCCcore-12.3.0-llvmlite</code><br/><code>NLTK/3.8.1-foss-2023a</code><br/><code>numba/0.58.1-foss-2023a</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0</code><br/><code>scikit-image/0.22.0-foss-2023a</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0</code><br/></pre>other under _2023.06/software/linux/aarch64/generic_<br/><pre><code>2023.06/init/easybuild/eb_hooks.py</code><br/></pre></details></dd></dl></details>|
|Jun 12 12:04:20 UTC 2024|test result|<details><summary>:cry: FAILURE _(click triangle for details)_</summary><dl><dt>_Reason_</dt><dd>EESSI test suite produced failures.</dd><dt>_ReFrame Summary_</dt><dd>[ FAILED ] Ran 12/12 test case(s) from 12 check(s) (2 failure(s), 0 skipped, 0 aborted)</dd><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-12608.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:x: found message matching <code>\[\s\*FAILED\s\*\].\*Ran .\* test case</code><br/></dd></dl></details>|
The two jobs (12607 and 12608) that did not include any fixes failed both in the sanity check for librosa
. After enabling the fixes for that by
ctypes
library;parse_hook
to use the custom ctypes
library in the sanity check; andpre_module_hook
that adds a setting to use this custom ctypes
library when the module for librosa
is loaded;we repeat the building for the same architectures zen2
and aarch64/generic
...
bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software bot: build arch:aarch64/generic repo:eessi.io-2023.06-software
eessi-bot-mc-aws
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws
for architecture x86_64-amd-zen2
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12808
torchvision
of PyTorch-bundle...
=================================== FAILURES ===================================
___ test_decode_jpeg[None-ImageReadMode.UNCHANGED-grace_hopper_517x606.jpg] ____
test/test_image.py:94: in test_decode_jpeg
img_ljpeg = decode_image(data, mode=mode)
/tmp/eb-7t6okia0/eb-js7oqjgv/tmpjpww4km2/lib/python3.11/site-packages/torchvision/io/image.py:236: in decode_image
output = torch.ops.image.decode_image(input, mode.value)
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/PyTorch/2.1.2-foss-2023a/lib/python3.11/site-packages/torch/_ops.py:692: in __call__
return self._op(*args, **kwargs or {})
E RuntimeError: decode_jpeg: torchvision not compiled with libjpeg support
inspecting the job's individual build step logs (via bot/inspect.sh --resume previous_tmp/build_step/eessi.io-2023.06-software-1718457554.tgz
run in the job's working directory /project/def-users/SHARED/jobs/2024.06/pr_603/12808
on the same type of node // e.g., via an interactive job submitted with srun --partition x86-64-amd-zen2-node --time=60 --pty bash
), we find the following messages in /tmp/eb-7t6okia0/eb-js7oqjgv/easybuild-run_cmd-9b5lqisq.log
(log file for building the extension torchvision
)
Compiling extensions with following flags:
FORCE_CUDA: False
FORCE_MPS: False
DEBUG: False
TORCHVISION_USE_PNG: True
TORCHVISION_USE_JPEG: True
TORCHVISION_USE_NVJPEG: True
TORCHVISION_USE_FFMPEG: True
TORCHVISION_USE_VIDEO_CODEC: True
NVCC_FLAGS:
Compiling with debug mode OFF
Found PNG library
Building torchvision with PNG image support
libpng version: 1.6.39
libpng include path: /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/libpng/1.6.39-GCCcore-12.3.0/include/libpng16
Running build on conda-build: False
Running build on conda: False
Building torchvision without JPEG image support
Building torchvision without NVJPEG image support
jpeg
library and hence builds without JPEG
supportthe setup.py
in /tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchvision/vision-0.16.2
that produces the above messages showing that torchvision
is compiled without JPEG
support includes a function find_library
with the following code
def find_library(name, vision_include):
this_dir = os.path.dirname(os.path.abspath(__file__))
build_prefix = os.environ.get("BUILD_PREFIX", None)
is_conda_build = build_prefix is not None
library_found = False
conda_installed = False
lib_folder = None
include_folder = None
library_header = f"{name}.h"
# Lookup in TORCHVISION_INCLUDE or in the package file
package_path = [os.path.join(this_dir, "torchvision")]
for folder in vision_include + package_path:
candidate_path = os.path.join(folder, library_header)
library_found = os.path.exists(candidate_path)
if library_found:
break
setup.py
) manually in an "inspect" session revealed that the second parameter to find_library
was an empty list []
TORCHVISION_INCLUDE
was not set although it should have been if the easyblock for torchvision
is used, see https://github.com/easybuilders/easybuild-easyblocks/blob/10e9a62d44d653e04f735962620a33bc22225477/easybuild/easyblocks/t/torchvision.py#L83-L85date | job status | comment |
---|---|---|
Jun 15 12:04:28 UTC 2024 | submitted | job id 12808 awaits release by job manager |
Jun 15 12:04:32 UTC 2024 | released | job awaits launch by Slurm scheduler |
Jun 15 12:10:36 UTC 2024 | running | job 12808 is running |
Jun 15 13:47:58 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
Jun 15 13:47:58 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
New job on instance eessi-bot-mc-aws
for architecture aarch64-generic
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12809
SentencePiece/0.2.0-GCC-12.3.0
with the following log messages
== 2024-06-15 12:40:44,834 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: sanity check command python -c 'import sentencepiece' exited with code 1 (output: Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/SentencePiece/0.2.0-GCC-12.3.0/lib/python3.11/site-packages/sentencepiece/__init__.py", line 10, in <module>
from . import _sentencepiece
ImportError: /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so.4: cannot allocate memory in static TLS block
) (at easybuild/framework/easyblock.py:3669 in _sanity_check_step)
date | job status | comment |
---|---|---|
Jun 15 12:04:32 UTC 2024 | submitted | job id 12809 awaits release by job manager |
Jun 15 12:05:34 UTC 2024 | released | job awaits launch by Slurm scheduler |
Jun 15 12:11:38 UTC 2024 | running | job 12809 is running |
Jun 15 13:04:14 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
Jun 15 13:04:14 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
The two jobs (12608 // zen2
and 12609 // aarch64/generic
) didn't fail for the earlier reason (import of soundfile failed). They failed for different reasons however (for details see above). We first fix the issue for aarch64/generic
(because the build for that architecture failed earlier than the build for zen2
). The fix disables the use of the TC_MALLOC library. Because the fix is made for aarch64/generic
only, we also check if builds for the other aarch64
are affected by the issue.
bot: build arch:aarch64/generic repo:eessi.io-2023.06-software bot: build arch:aarch64/neoverse_n1 repo:eessi.io-2023.06-software bot: build arch:aarch64/neoverse_v1 repo:eessi.io-2023.06-software
eessi-bot-mc-aws
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws
for architecture aarch64-generic
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12813
torchtext
== 2024-06-15 18:44:56,282 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): cmd "export PYTHONPATH=/tmp/eb-4o0di9ui/eb-qo9jlvzo/tmp0g004
oib/lib/python3.11/site-packages:$PYTHONPATH && pytest test/torchtext_unittest -k "not test_vocab_from_raw_text_file"" and not test_get_tokenizer_moses"" and not test_get_tokenizer_spacy"" and no
t test_download_charngram_vectors" " exited with exit code -11 and output:
Fatal Python error: Segmentation fault
Current thread 0x000040002a9e5a00 (most recent call first):
File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1268 in TestMaskTransform
File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1255 in
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.ra ndom._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._lina lg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, gmpy2.gmpy2, simplejson._speedups (total: 22)
- it may be that we have seen that earlier when building for NESSI ... we didn't have a fix for that there, so this requires more investigation
|date|job status|comment|
|----------|----------|------------------------|
|Jun 15 18:07:39 UTC 2024|submitted|job id `12813` awaits release by job manager|
|Jun 15 18:08:23 UTC 2024|released|job awaits launch by Slurm scheduler|
|Jun 15 18:13:30 UTC 2024|running|job `12813` is running|
|Jun 15 19:09:48 UTC 2024|finished|<details><summary>:cry: FAILURE _(click triangle for details)_</summary><dl><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-12813.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:x: found message matching <code>FAILED: </code><br/>:x: found message matching <code> required modules missing:</code><br/>:x: no message matching <code>No missing installations</code><br/>:white_check_mark: found message matching <code>\.tar\.gz created!</code><br/></dd><dt>_Artefacts_</dt><dd><details><summary><code>eessi-2023.06-software-linux-aarch64-generic-1718477177.tar.gz</code></summary>size: 271 MiB (284370882 bytes)<br/>entries: 9314<br/>modules under _2023.06/software/linux/aarch64/generic/modules/all_<br/><pre><code>custom_ctypes/1.2.lua</code><br/><code>gperftools/2.12-GCCcore-12.3.0.lua</code><br/><code>imageio/2.33.1-gfbf-2023a.lua</code><br/><code>libmad/0.15.1b-GCCcore-12.3.0.lua</code><br/><code>librosa/0.10.1-foss-2023a.lua</code><br/><code>LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua</code><br/><code>NLTK/3.8.1-foss-2023a.lua</code><br/><code>numba/0.58.1-foss-2023a.lua</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0.lua</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0.lua</code><br/><code>scikit-image/0.22.0-foss-2023a.lua</code><br/><code>SentencePiece/0.2.0-GCC-12.3.0.lua</code><br/><code>SoX/14.4.2-GCCcore-12.3.0.lua</code><br/><code>tensorboard/2.15.1-gfbf-2023a.lua</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0.lua</code><br/></pre>software under _2023.06/software/linux/aarch64/generic/software_<br/><pre><code>custom_ctypes/1.2</code><br/><code>gperftools/2.12-GCCcore-12.3.0</code><br/><code>imageio/2.33.1-gfbf-2023a</code><br/><code>libmad/0.15.1b-GCCcore-12.3.0</code><br/><code>librosa/0.10.1-foss-2023a</code><br/><code>LLVM/14.0.6-GCCcore-12.3.0-llvmlite</code><br/><code>NLTK/3.8.1-foss-2023a</code><br/><code>numba/0.58.1-foss-2023a</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0</code><br/><code>scikit-image/0.22.0-foss-2023a</code><br/><code>SentencePiece/0.2.0-GCC-12.3.0</code><br/><code>SoX/14.4.2-GCCcore-12.3.0</code><br/><code>tensorboard/2.15.1-gfbf-2023a</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0</code><br/></pre>other under _2023.06/software/linux/aarch64/generic_<br/><pre><code>2023.06/init/easybuild/eb_hooks.py</code><br/></pre></details></dd></dl></details>|
|Jun 15 19:09:48 UTC 2024|test result|<details><summary>:grin: SUCCESS _(click triangle for details)_</summary><dl><dt>_ReFrame Summary_</dt><dd>[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)</dd><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-12813.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:white_check_mark: no message matching <code>\[\s\*FAILED\s\*\].\*Ran .\* test case</code><br/></dd></dl></details>|
New job on instance eessi-bot-mc-aws
for architecture aarch64-neoverse_n1
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12814
aarch64/generic
== 2024-06-15 18:42:59,199 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: sanity check command python -c 'import
sentencepiece' exited with code 1 (output: Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/SentencePiece/0.2.0-GCC-12.3.0/lib/python3.11/site-packages/sentencepiece/__init__.py", line 10, in <m
odule>
from . import _sentencepiece
ImportError: /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so.4: cannot allocate memory in static T
LS block
date | job status | comment |
---|---|---|
Jun 15 18:07:43 UTC 2024 | submitted | job id 12814 awaits release by job manager |
Jun 15 18:08:25 UTC 2024 | released | job awaits launch by Slurm scheduler |
Jun 15 18:14:32 UTC 2024 | running | job 12814 is running |
Jun 15 19:06:45 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
Jun 15 19:06:45 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
New job on instance eessi-bot-mc-aws
for architecture aarch64-neoverse_v1
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12815
aarch64/generic
== 2024-06-15 18:36:00,141 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: sanity check command python -c 'import
sentencepiece' exited with code 1 (output: Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_v1/software/SentencePiece/0.2.0-GCC-12.3.0/lib/python3.11/site-packages/sentencepiece/__init__.py", line 10, in <m
odule>
from . import _sentencepiece
ImportError: /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_v1/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so.4: cannot allocate memory in static T
LS block
date | job status | comment |
---|---|---|
Jun 15 18:07:47 UTC 2024 | submitted | job id 12815 awaits release by job manager |
Jun 15 18:08:27 UTC 2024 | released | job awaits launch by Slurm scheduler |
Jun 15 18:14:34 UTC 2024 | running | job 12815 is running |
Jun 15 18:52:16 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
Jun 15 18:52:16 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
Rebuilding for aarch64/neoverse_n1
and aarch64/neoverse_v1
after fix for SentencePiece
has been extended to these architectures...
bot: build arch:aarch64/neoverse_n1 repo:eessi.io-2023.06-software bot: build arch:aarch64/neoverse_v1 repo:eessi.io-2023.06-software
eessi-bot-mc-aws
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws
for architecture aarch64-neoverse_n1
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12816
aarch64/generic
== 2024-06-15 20:08:01,404 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): cmd "export PYTHONPATH=/tmp/eb-t17gza4h/eb-ul5a_hbb/tmpr1l71
y06/lib/python3.11/site-packages:$PYTHONPATH && pytest test/torchtext_unittest -k "not test_vocab_from_raw_text_file"" and not test_get_tokenizer_moses"" and not test_get_tokenizer_spacy"" and no
t test_download_charngram_vectors" " exited with exit code -11 and output:
Fatal Python error: Segmentation fault
Current thread 0x000040003d3e5a80 (most recent call first):
File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1268 in TestMaskTransform
File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1255 in
|date|job status|comment|
|----------|----------|------------------------|
|Jun 15 19:34:52 UTC 2024|submitted|job id `12816` awaits release by job manager|
|Jun 15 19:35:52 UTC 2024|released|job awaits launch by Slurm scheduler|
|Jun 15 19:36:56 UTC 2024|running|job `12816` is running|
|Jun 15 20:35:33 UTC 2024|finished|<details><summary>:cry: FAILURE _(click triangle for details)_</summary><dl><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-12816.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:x: found message matching <code>FAILED: </code><br/>:x: found message matching <code> required modules missing:</code><br/>:x: no message matching <code>No missing installations</code><br/>:white_check_mark: found message matching <code>\.tar\.gz created!</code><br/></dd><dt>_Artefacts_</dt><dd><details><summary><code>eessi-2023.06-software-linux-aarch64-neoverse_n1-1718482255.tar.gz</code></summary>size: 271 MiB (284726536 bytes)<br/>entries: 9314<br/>modules under _2023.06/software/linux/aarch64/neoverse_n1/modules/all_<br/><pre><code>custom_ctypes/1.2.lua</code><br/><code>gperftools/2.12-GCCcore-12.3.0.lua</code><br/><code>imageio/2.33.1-gfbf-2023a.lua</code><br/><code>libmad/0.15.1b-GCCcore-12.3.0.lua</code><br/><code>librosa/0.10.1-foss-2023a.lua</code><br/><code>LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua</code><br/><code>NLTK/3.8.1-foss-2023a.lua</code><br/><code>numba/0.58.1-foss-2023a.lua</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0.lua</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0.lua</code><br/><code>scikit-image/0.22.0-foss-2023a.lua</code><br/><code>SentencePiece/0.2.0-GCC-12.3.0.lua</code><br/><code>SoX/14.4.2-GCCcore-12.3.0.lua</code><br/><code>tensorboard/2.15.1-gfbf-2023a.lua</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0.lua</code><br/></pre>software under _2023.06/software/linux/aarch64/neoverse_n1/software_<br/><pre><code>custom_ctypes/1.2</code><br/><code>gperftools/2.12-GCCcore-12.3.0</code><br/><code>imageio/2.33.1-gfbf-2023a</code><br/><code>libmad/0.15.1b-GCCcore-12.3.0</code><br/><code>librosa/0.10.1-foss-2023a</code><br/><code>LLVM/14.0.6-GCCcore-12.3.0-llvmlite</code><br/><code>NLTK/3.8.1-foss-2023a</code><br/><code>numba/0.58.1-foss-2023a</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0</code><br/><code>scikit-image/0.22.0-foss-2023a</code><br/><code>SentencePiece/0.2.0-GCC-12.3.0</code><br/><code>SoX/14.4.2-GCCcore-12.3.0</code><br/><code>tensorboard/2.15.1-gfbf-2023a</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0</code><br/></pre>other under _2023.06/software/linux/aarch64/neoverse_n1_<br/><pre><code>2023.06/init/easybuild/eb_hooks.py</code><br/></pre></details></dd></dl></details>|
|Jun 15 20:35:33 UTC 2024|test result|<details><summary>:grin: SUCCESS _(click triangle for details)_</summary><dl><dt>_ReFrame Summary_</dt><dd>[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)</dd><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-12816.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:white_check_mark: no message matching <code>\[\s\*FAILED\s\*\].\*Ran .\* test case</code><br/></dd></dl></details>|
New job on instance eessi-bot-mc-aws
for architecture aarch64-neoverse_v1
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12817
aarch64/generic
== 2024-06-15 20:00:37,536 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): cmd "export PYTHONPATH=/tmp/eb-663ngo7q/eb-6zm49he7/tmph9cft
g0x/lib/python3.11/site-packages:$PYTHONPATH && pytest test/torchtext_unittest -k "not test_vocab_from_raw_text_file"" and not test_get_tokenizer_moses"" and not test_get_tokenizer_spacy"" and no
t test_download_charngram_vectors" " exited with exit code -11 and output:
Fatal Python error: Segmentation fault
Current thread 0x000040003cc75a80 (most recent call first):
File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1268 in TestMaskTransform
File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1255 in
|date|job status|comment|
|----------|----------|------------------------|
|Jun 15 19:34:56 UTC 2024|submitted|job id `12817` awaits release by job manager|
|Jun 15 19:35:54 UTC 2024|released|job awaits launch by Slurm scheduler|
|Jun 15 19:36:58 UTC 2024|running|job `12817` is running|
|Jun 15 20:18:15 UTC 2024|finished|<details><summary>:cry: FAILURE _(click triangle for details)_</summary><dl><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-12817.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:x: found message matching <code>FAILED: </code><br/>:x: found message matching <code> required modules missing:</code><br/>:x: no message matching <code>No missing installations</code><br/>:white_check_mark: found message matching <code>\.tar\.gz created!</code><br/></dd><dt>_Artefacts_</dt><dd><details><summary><code>eessi-2023.06-software-linux-aarch64-neoverse_v1-1718481760.tar.gz</code></summary>size: 271 MiB (284470404 bytes)<br/>entries: 9314<br/>modules under _2023.06/software/linux/aarch64/neoverse_v1/modules/all_<br/><pre><code>custom_ctypes/1.2.lua</code><br/><code>gperftools/2.12-GCCcore-12.3.0.lua</code><br/><code>imageio/2.33.1-gfbf-2023a.lua</code><br/><code>libmad/0.15.1b-GCCcore-12.3.0.lua</code><br/><code>librosa/0.10.1-foss-2023a.lua</code><br/><code>LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua</code><br/><code>NLTK/3.8.1-foss-2023a.lua</code><br/><code>numba/0.58.1-foss-2023a.lua</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0.lua</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0.lua</code><br/><code>scikit-image/0.22.0-foss-2023a.lua</code><br/><code>SentencePiece/0.2.0-GCC-12.3.0.lua</code><br/><code>SoX/14.4.2-GCCcore-12.3.0.lua</code><br/><code>tensorboard/2.15.1-gfbf-2023a.lua</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0.lua</code><br/></pre>software under _2023.06/software/linux/aarch64/neoverse_v1/software_<br/><pre><code>custom_ctypes/1.2</code><br/><code>gperftools/2.12-GCCcore-12.3.0</code><br/><code>imageio/2.33.1-gfbf-2023a</code><br/><code>libmad/0.15.1b-GCCcore-12.3.0</code><br/><code>librosa/0.10.1-foss-2023a</code><br/><code>LLVM/14.0.6-GCCcore-12.3.0-llvmlite</code><br/><code>NLTK/3.8.1-foss-2023a</code><br/><code>numba/0.58.1-foss-2023a</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0</code><br/><code>scikit-image/0.22.0-foss-2023a</code><br/><code>SentencePiece/0.2.0-GCC-12.3.0</code><br/><code>SoX/14.4.2-GCCcore-12.3.0</code><br/><code>tensorboard/2.15.1-gfbf-2023a</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0</code><br/></pre>other under _2023.06/software/linux/aarch64/neoverse_v1_<br/><pre><code>2023.06/init/easybuild/eb_hooks.py</code><br/></pre></details></dd></dl></details>|
|Jun 15 20:18:15 UTC 2024|test result|<details><summary>:grin: SUCCESS _(click triangle for details)_</summary><dl><dt>_ReFrame Summary_</dt><dd>[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)</dd><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-12817.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:white_check_mark: no message matching <code>\[\s\*FAILED\s\*\].\*Ran .\* test case</code><br/></dd></dl></details>|
Rebuilding for zen2
to verify if a new easyblock for torchvision fixes the issue that libjpeg
couldn't be find...
bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software
boegel-bot-deucalion
(click for details)eessi-bot-mc-aws
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws
for architecture x86_64-amd-zen2
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/13549
the installation of PyTorch-bundle
succeeded, so the updated easyblock for torchvision
works! :tada:
however, the build failed when checking for missing installations with
1 out of 138 required modules missing:
grpcio/1.57.0-GCCcore-12.3.0 (grpcio-1.57.0-GCCcore-12.3.0.eb)
that should be easy to fix, see https://github.com/NorESSI/software-layer/pull/408
date | job status | comment |
---|---|---|
Jun 29 20:55:20 UTC 2024 | submitted | job id 13549 awaits release by job manager |
Jun 29 20:55:26 UTC 2024 | released | job awaits launch by Slurm scheduler |
Jun 29 21:00:28 UTC 2024 | running | job 13549 is running |
Jun 29 23:04:35 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
Jun 29 23:04:35 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
Rebuilding for
zen2
to verify if a new easyblock for torchvision fixes the issue thatlibjpeg
couldn't be find...
Maybe related to:
Rebuilding after #655 got merged to verify if the import soundfile
in librosa
's sanity check succeeds...
bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software bot: build arch:aarch64/generic repo:eessi.io-2023.06-software
eessi-bot-mc-aws
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws
for architecture x86_64-amd-zen2
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.08/pr_603/15500
installation of PyTorch-bundle succeeded, but then the check for missing installations failed with
1 out of 138 required modules missing:
grpcio/1.57.0-GCCcore-12.3.0 (grpcio-1.57.0-GCCcore-12.3.0.eb)
librosa has already been ingested (hence sanity check wasn't run at all)
date | job status | comment |
---|---|---|
Aug 01 07:12:23 UTC 2024 | submitted | job id 15500 awaits release by job manager |
Aug 01 07:12:54 UTC 2024 | released | job awaits launch by Slurm scheduler |
Aug 01 07:18:58 UTC 2024 | running | job 15500 is running |
Aug 01 09:08:30 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
Aug 01 09:08:30 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
New job on instance eessi-bot-mc-aws
for architecture aarch64-generic
for repository eessi.io-2023.06-software
in job dir /project/def-users/SHARED/jobs/2024.08/pr_603/15501
Segmentation fault
== 2024-08-01 07:37:12,561 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): cmd "export PYTHONPATH=/tmp/eb-en1c7x64/eb-intmrk91/tmphzi6yecp/lib/python3.11/site-packages:$PYTHONPATH
&& pytest test/torchtext_unittest -k "not test_vocab_from_raw_text_file"" and not test_get_tokenizer_moses"" and not test_get_tokenizer_spacy"" and not test_download_charngram_vectors" " exited with exit code -11 and output:
Fatal Python error: Segmentation fault
Current thread 0x000040003ebf5a00 (most recent call first):
File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1268 in TestMaskTransform
File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1255 in
- librosa has already been ingested (hence sanity check wasn't run at all)
|date|job status|comment|
|----------|----------|------------------------|
|Aug 01 07:12:27 UTC 2024|submitted|job id `15501` awaits release by job manager|
|Aug 01 07:12:52 UTC 2024|released|job awaits launch by Slurm scheduler|
|Aug 01 07:18:56 UTC 2024|running|job `15501` is running|
|Aug 01 08:14:17 UTC 2024|finished|<details><summary>:cry: FAILURE _(click triangle for details)_</summary><dl><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-15501.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:x: found message matching <code>FAILED: </code><br/>:x: found message matching <code> required modules missing:</code><br/>:white_check_mark: found message(s) matching <code>No missing installations</code><br/>:white_check_mark: found message matching <code>\.tar\.gz created!</code><br/></dd><dt>_Artefacts_</dt><dd><details><summary><code>eessi-2023.06-software-linux-aarch64-generic-1722497905.tar.gz</code></summary>size: 142 MiB (149117531 bytes)<br/>entries: 4815<br/>modules under _2023.06/software/linux/aarch64/generic/modules/all_<br/><pre><code>gperftools/2.12-GCCcore-12.3.0.lua</code><br/><code>imageio/2.33.1-gfbf-2023a.lua</code><br/><code>libmad/0.15.1b-GCCcore-12.3.0.lua</code><br/><code>NLTK/3.8.1-foss-2023a.lua</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0.lua</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0.lua</code><br/><code>scikit-image/0.22.0-foss-2023a.lua</code><br/><code>SentencePiece/0.2.0-GCC-12.3.0.lua</code><br/><code>SoX/14.4.2-GCCcore-12.3.0.lua</code><br/><code>tensorboard/2.15.1-gfbf-2023a.lua</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0.lua</code><br/></pre>software under _2023.06/software/linux/aarch64/generic/software_<br/><pre><code>gperftools/2.12-GCCcore-12.3.0</code><br/><code>imageio/2.33.1-gfbf-2023a</code><br/><code>libmad/0.15.1b-GCCcore-12.3.0</code><br/><code>NLTK/3.8.1-foss-2023a</code><br/><code>parameterized/0.9.0-GCCcore-12.3.0</code><br/><code>Scalene/1.5.26-GCCcore-12.3.0</code><br/><code>scikit-image/0.22.0-foss-2023a</code><br/><code>SentencePiece/0.2.0-GCC-12.3.0</code><br/><code>SoX/14.4.2-GCCcore-12.3.0</code><br/><code>tensorboard/2.15.1-gfbf-2023a</code><br/><code>tqdm/4.66.1-GCCcore-12.3.0</code><br/></pre>other under _2023.06/software/linux/aarch64/generic_<br/><pre><code>2023.06/init/easybuild/eb_hooks.py</code><br/></pre></details></dd></dl></details>|
|Aug 01 08:14:17 UTC 2024|test result|<details><summary>:grin: SUCCESS _(click triangle for details)_</summary><dl><dt>_ReFrame Summary_</dt><dd>[ PASSED ] Ran 16/16 test case(s) from 16 check(s) (0 failure(s), 0 skipped, 0 aborted)</dd><dt>_Details_</dt><dd>:white_check_mark: job output file <code>slurm-15501.out</code><br/>:x: found message matching <code>ERROR: </code><br/>:white_check_mark: no message matching <code>\[\s\*FAILED\s\*\].\*Ran .\* test case</code><br/></dd></dl></details>|
Rebuilding after changes have been minimised (only hook for SentencePiece kept for now) and #660 has been ingested...
bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software
eessi-bot-mc-aws
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_603/15895 |
date | job status | comment |
---|---|---|---|
Aug 08 10:29:43 UTC 2024 | submitted | job id 15895 awaits release by job manager |
|
Aug 08 10:30:06 UTC 2024 | released | job awaits launch by Slurm scheduler | |
Aug 08 10:36:09 UTC 2024 | running | job 15895 is running |
|
Aug 08 12:26:42 UTC 2024 | finished | :grin: SUCCESS (click triangle for details)
|
|
Aug 08 12:26:42 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
Revisit switching off TCMALLOC...
bot: build arch:aarch64/generic repo:eessi.io-2023.06-software
eessi-bot-mc-aws
(click for details)eessi-bot-mc-azure
(click for details)New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_603/17634 |
date | job status | comment |
---|---|---|---|
Sep 03 12:33:52 UTC 2024 | submitted | job id 17634 awaits release by job manager |
|
Sep 03 12:34:22 UTC 2024 | released | job awaits launch by Slurm scheduler | |
Sep 03 12:40:25 UTC 2024 | running | job 17634 is running |
|
Sep 03 13:53:08 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
|
Sep 03 13:53:08 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
Maybe switch off the following
The main purpose of this PR is to facilitate debugging various issues when building PyTorch-bundle and demonstrating approaches that could solve the issues. It is expected that the fixes provided here are not final.
find_library
provided byctypes.util
which prevented importingsoundfile
~aarch64/{generic,neoverse_n1,neoverse_v1}
where importingsentencepiece
lead to the errorlibtcmalloc_minimal.so.4: cannot allocate memory in static TLS block
torchvision
where some library was not compiled withjpeg
support, hence some tests failed $\rightarrow$~Initially we will disable all fixes, build for selected architectures and document the errors. We then enable fixes one-by-one and document the results (some error fixed, some new errors, ...).
Note, see the original PR for PyTorch-bundle (https://github.com/EESSI/software-layer/pull/585) for additional discussion about some of the issues listed above.