DeiC-HPC / cotainr

cotainr - a user space Apptainer/Singularity container builder.
European Union Public License 1.2
17 stars 3 forks source link

Still getting the error "packer failed to pack: copy Failed: symlink ..." #52

Closed jiemakel closed 3 months ago

jiemakel commented 5 months ago

Issue #48 seems not to be completely solved. When trying the sfantao-pytorch recipe, build with cotainr checked out from github still fails with:

jiemakel@uan01:/projappl/project_462000347/jiemakel/ecco-image-processing/cotainr> bin/cotainr build /scratch/project_462000347/jiemakel/ecco-image-processing/lumi-sfantao-pytorch-lumi-base.sif --base-image=/appl/local/containers/sif-images/lumi-rocm-rocm-5.5.3.sif --conda-env=examples/LUMI/conda_pytorch_rocm/py311_rocm542_pytorch.yml
Cotainr:-: Creating Singularity Sandbox
SingularitySandbox.err:-: WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process
SingularitySandbox.err:-: WARNING: integrity: signature not found for object group 1
SingularitySandbox.err:-: WARNING: Bootstrap image could not be verified, but build will continue.
Cotainr:-: Installing Conda environment: /pfs/lustrep4/projappl/project_462000347/jiemakel/ecco-image-processing/cotainr/examples/LUMI/conda_pytorch_rocm/py311_rocm542_pytorch.yml

Welcome to Miniforge3 23.11.0-0

In order to continue the installation process, please review the license
agreement.

Miniforge installer code uses BSD-3-Clause license as stated below.

Binary packages that come with it have their own licensing terms
and by installing miniforge you agree to the licensing terms of individual
packages as well. They include different OSI-approved licenses including
the GNU General Public License and can be found in pkgs/<pkg-name>/info/licenses
folders.

Miniforge installer comes with a boostrapping executable that is used
when installing miniforge and is deleted after miniforge is installed.
The bootstrapping executable uses micromamba, cli11, cpp-filesystem,
curl, c-ares, krb5, libarchive, libev, lz4, nghttp2, openssl, libsolv,
nlohmann-json, reproc and zstd which are licensed under BSD-3-Clause,
MIT and OpenSSL licenses. Licenses and copyright notices of these
projects can be found at the following URL.
https://github.com/conda-forge/micromamba-feedstock/tree/master/recipe.

=============================================================================

Copyright (c) 2019-2022, conda-forge
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software without
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Do you accept the license terms? [yes|no]
>>> yes
CondaInstall.out:-: WARNING:
Cotainr:-: Cleaning up unused Conda files
Cotainr:-: Finished installing conda environment: /pfs/lustrep4/projappl/project_462000347/jiemakel/ecco-image-processing/cotainr/examples/LUMI/conda_pytorch_rocm/py311_rocm542_pytorch.yml
Cotainr:-: Adding metadata to container
Cotainr:-: Building container image
SingularitySandbox.err:-: WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process
SingularitySandbox.err:-: FATAL:   While performing build: packer failed to pack: copy Failed: symlink GlobalSign_Root_R46.pem /tmp/build-temp-1028729988/rootfs/var/lib/ca-certificates/openssl/002c0b4f.0: permission denied
Traceback (most recent call last):
  File "/pfs/lustrep4/projappl/project_462000347/jiemakel/ecco-image-processing/cotainr/bin/cotainr", line 14, in <module>
    sys.exit(main())
  File "/pfs/lustrep4/projappl/project_462000347/jiemakel/ecco-image-processing/cotainr/cotainr/cli.py", line 594, in main
    cli.subcommand.execute()
  File "/pfs/lustrep4/projappl/project_462000347/jiemakel/ecco-image-processing/cotainr/cotainr/cli.py", line 253, in execute
    sandbox.build_image(path=self.image_path)
  File "/pfs/lustrep4/projappl/project_462000347/jiemakel/ecco-image-processing/cotainr/cotainr/container.py", line 184, in build_image
    self._subprocess_runner(
  File "/pfs/lustrep4/projappl/project_462000347/jiemakel/ecco-image-processing/cotainr/cotainr/container.py", line 335, in _subprocess_runner
    return util.stream_subprocess(
  File "/pfs/lustrep4/projappl/project_462000347/jiemakel/ecco-image-processing/cotainr/cotainr/util.py", line 136, in stream_subprocess
    completed_process.check_returncode()
  File "/opt/cray/pe/python/3.10.10/lib/python3.10/subprocess.py", line 457, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['singularity', '-q', '--nocolor', 'build', '--force', PosixPath('/pfs/lustrep4/scratch/project_462000347/jiemakel/ecco-image-processing/lumi-sfantao-pytorch-lumi-base.sif'), PosixPath('/tmp/tmpl6k7jyas/singularity_sandbox')]' returned non-zero exit status 255.

Note however that no --fix-perms WARNINGs appear, so that flag does seem to be in effect, but still the packing fails.

Chroxvi commented 5 months ago

Thanks for reporting this issue.

I can reproduce it on LUMI with the current cotainr main branch (6d0f2ff0b692d38cfaa1d34a103e2dcd7f67a888):

schouoxv@uan02:~/cotainr_dev/cotainr> git branch --show
main
schouoxv@uan02:~/cotainr_dev/cotainr> git rev-parse HEAD
6d0f2ff0b692d38cfaa1d34a103e2dcd7f67a888
schouoxv@uan02:~/cotainr_dev/cotainr> singularity --version
singularity-ce version 3.11.4-1
schouoxv@uan02:~/cotainr_dev/cotainr> module load cray-python
schouoxv@uan02:~/cotainr_dev/cotainr> ./bin/cotainr build test.sif --base-image=/appl/local/containers/sif-images/lumi-rocm-rocm-5.5.3.sif
Cotainr:-: Creating Singularity Sandbox
SingularitySandbox.err:-: WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process
SingularitySandbox.err:-: WARNING: integrity: signature not found for object group 1
SingularitySandbox.err:-: WARNING: Bootstrap image could not be verified, but build will continue.
Cotainr:-: Adding metadata to container
Cotainr:-: Building container image
SingularitySandbox.err:-: WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process
SingularitySandbox.err:-: FATAL:   While performing build: packer failed to pack: copy Failed: symlink GlobalSign_Root_R46.pem /tmp/build-temp-453929880/rootfs/var/lib/ca-certificates/openssl/002c0b4f.0: permission denied
Traceback (most recent call last):
  File "/pfs/lustrep2/users/schouoxv/cotainr_dev/cotainr/./bin/cotainr", line 14, in <module>
    sys.exit(main())
  File "/pfs/lustrep2/users/schouoxv/cotainr_dev/cotainr/cotainr/cli.py", line 594, in main
    cli.subcommand.execute()
  File "/pfs/lustrep2/users/schouoxv/cotainr_dev/cotainr/cotainr/cli.py", line 253, in execute
    sandbox.build_image(path=self.image_path)
  File "/pfs/lustrep2/users/schouoxv/cotainr_dev/cotainr/cotainr/container.py", line 184, in build_image
    self._subprocess_runner(
  File "/pfs/lustrep2/users/schouoxv/cotainr_dev/cotainr/cotainr/container.py", line 335, in _subprocess_runner
    return util.stream_subprocess(
  File "/pfs/lustrep2/users/schouoxv/cotainr_dev/cotainr/cotainr/util.py", line 136, in stream_subprocess
    completed_process.check_returncode()
  File "/opt/cray/pe/python/3.10.10/lib/python3.10/subprocess.py", line 457, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['singularity', '-q', '--nocolor', 'build', '--force', PosixPath('/pfs/lustrep2/users/schouoxv/cotainr_dev/cotainr/test.sif'), PosixPath('/tmp/tmp0w6l4sjt/singularity_sandbox')]' returned non-zero exit status 255.
schouoxv@uan02:~/cotainr_dev/cotainr> ls -lh /appl/local/containers/sif-images/lumi-rocm-rocm-5.5.3.sif
lrwxrwxrwx 1 samantao project_462000394 117 Jan 24 17:48 /appl/local/containers/sif-images/lumi-rocm-rocm-5.5.3.sif -> /pfs/lustrep3/scratch/project_462000394/containers/tested-containers/lumi-rocm-rocm-5.5.3-dockerhash-e6d86d57bc53.sif
schouoxv@uan02:~/cotainr_dev/cotainr> sha256sum /appl/local/containers/sif-images/lumi-rocm-rocm-5.5.3.sif
0d4673d2aaea75ff63662e69b8c258aeacc06f02825aa36b6afd3b0dfe013a20  /appl/local/containers/sif-images/lumi-rocm-rocm-5.5.3.sif

However, on my local setup, running Apptainer 1.2.5, it builds without problems:

(py311_cotainr_test) ~/cotainr$ git branch --show-current 
main
(py311_cotainr_test) ~/cotainr$ git rev-parse HEAD
6d0f2ff0b692d38cfaa1d34a103e2dcd7f67a888
(py311_cotainr_test) ~/cotainr$ singularity --version
apptainer version 1.2.5
(py311_cotainr_test) ~/cotainr$ ./bin/cotainr build test.sif --base-image=lumi-rocm-rocm-5.5.3.sif
Cotainr:-: Creating Singularity Sandbox
Cotainr:-: Adding metadata to container
Cotainr:-: Building container image
Cotainr:-: Finished building ~/cotainr/test.sif in 00:04:16
(py311_cotainr_test) ~/cotainr$ sha256sum lumi-rocm-rocm-5.5.3.sif
0d4673d2aaea75ff63662e69b8c258aeacc06f02825aa36b6afd3b0dfe013a20  lumi-rocm-rocm-5.5.3.sif

I will look more into the cause of this issue, testing if it is problem with the current version of Singularity or mksquashfs on LUMI.

Chroxvi commented 5 months ago

I believe I narrowed down this issue to a problem with file permission handling in conversions of containers when using the SingularityCE runtime available on LUMI. Specifically, the issue seems to occur if the base container image contains files that are read-only by root (or contains a symlink to such a file - I am not really sure which of the two are causing the issue), and you do a specific 3 step conversion of such an image, as you end up doing when using cotainr to build a container based on the lumi-rocm-rocm-5.5.3.sif image. The 3 conversion are:

  1. A docker archive of lumi-rocm-rocm-5.5.3 (these images are being built from Dockerfiles) is converted to the lumi-rocm-rocm-5.5.3.sif available on LUMI under /appl/local/containers/sif-images (https://github.com/sfantao/lumi-containers/blob/1691828bb89def3702b0caa2f0864f975bc8ddcf/build-lumi-containers.sh#L146)
  2. The lumi-rocm-rocm-5.5.3.sif is converted into a Singularity sandbox directory by cotainr. Cotainr then manipulates the sandbox to include the conda environment.
  3. The sandbox directory is converted into the final SIF container by cotainr.

The lumi-rocm-rocm-5.5.3.sif is based on a SUSE Base Container Image (https://github.com/sfantao/lumi-containers/blob/1691828bb89def3702b0caa2f0864f975bc8ddcf/common/Dockerfile.header#L4). Starting with this SUSE BCI, I can reproduce the error on LUMI by replicating the above 3 conversions using only Singularity:

schouoxv@uan01:~> singularity --version
singularity-ce version 3.11.4-1
schouoxv@uan01:~> singularity build suse-bci-base-15.4.27.14.99.sif docker-archive://suse-bci-base-15.4.27.14.99.tar 
WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process
INFO:    Starting build...
2024/01/31 09:46:39  info unpack layer: sha256:2677047460bef4ab87a080b30bac937bc8660ca49143a3f3efd5839002727428
2024/01/31 09:46:39  warn xattr{etc/shadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
2024/01/31 09:46:39  warn xattr{/tmp/build-temp-492090261/rootfs/etc/shadow} destination filesystem does not support xattrs, further warnings will be suppressed
INFO:    Creating SIF file...
INFO:    Build complete: suse-bci-base-15.4.27.14.99.sif
schouoxv@uan01:~> singularity build --sandbox --fix-perms suse-bci-base-15.4.27.14.99-cotainr-sandbox suse-bci-base-15.4.27.14.99.sif 
WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process
INFO:    Starting build...
INFO:    Verifying bootstrap image suse-bci-base-15.4.27.14.99.sif
WARNING: integrity: signature not found for object group 1
WARNING: Bootstrap image could not be verified, but build will continue.
INFO:    Creating sandbox directory...
INFO:    Build complete: suse-bci-base-15.4.27.14.99-cotainr-sandbox
schouoxv@uan01:~> singularity build suse-bci-base-15.4.27.14.99-cotainr-sandbox.sif suse-bci-base-15.4.27.14.99-cotainr-sandbox/
WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process
INFO:    Starting build...
ERRO[0005] Can't add file suse-bci-base-15.4.27.14.99-cotainr-sandbox/var/lib/ca-certificates/openssl/1001acf7.0 to tar: io: read/write on closed pipe 
ERRO[0005] Can't close tar writer: io: read/write on closed pipe 
FATAL:   While performing build: packer failed to pack: copy Failed: symlink GlobalSign_Root_R46.pem /tmp/build-temp-398858017/rootfs/var/lib/ca-certificates/openssl/002c0b4f.0: permission denied

In my test, I used a docker archive of the SUSE BCI created on local hardware and copied to LUMI because if I directly singularity pull the SUSE BCI, I hit this issue: https://github.com/sylabs/singularity/issues/697

Docker achieve creation details ```bash docker pull registry.suse.com/bci/bci-base:15.4.27.14.99 docker image save -o suse-bci-base-15.4.27.14.99.tar registry.suse.com/bci/bci-base:15.4.27.14.99 ```

Checking the file permission of the failing build, the problematic files ends up being read-only and are not copied to the temporary rootfs directory used in the third and final conversion.

File permission and rootfs details ```bash schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99.sif ls -lh /var/lib/ca-certificates/openssl/002c0b4f.0 lrwxrwxrwx 1 root root 23 Sep 20 13:33 /var/lib/ca-certificates/openssl/002c0b4f.0 -> GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99.sif ls -lh /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem -r--r--r-- 1 root root 2.0K Sep 20 13:33 /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-cotainr-sandbox/ ls -lh /var/lib/ca-certificates/openssl/002c0b4f.0 lrwxrwxrwx 1 schouoxv pepr_schouoxv 23 Sep 20 13:33 /var/lib/ca-certificates/openssl/002c0b4f.0 -> GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-cotainr-sandbox ls -lh /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem -r--r--r-- 1 schouoxv pepr_schouoxv 2.0K Sep 20 13:33 /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity build --no-cleanup suse-bci-base-15.4.27.14.99-cotainr-sandbox.sif suse-bci-base-15.4.27.14.99-cotainr-sandbox/ WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process INFO: Starting build... INFO: Build performed with no clean up option, build bundle(s) located at: [/tmp/build-temp-2489093219/rootfs /tmp/bundle-temp-1563960584] FATAL: While performing build: packer failed to pack: copy Failed: symlink GlobalSign_Root_R46.pem /tmp/build-temp-2489093219/rootfs/var/lib/ca-certificates/openssl/002c0b4f.0: permission denied schouoxv@uan01:~> ls -lh /tmp/build-temp-2489093219/rootfs/var/lib/ca-certificates/openssl/ total 0 ```

Doing a direct conversion from the docker archive to a sandbox seems to work and the problematic files get read-write permissions.

Direct conversion from docker archive to sandbox details ```bash schouoxv@uan01:~> singularity build --fix-perms --sandbox suse-bci-base-15.4.27.14.99-cotainr-sandbox-from-docker docker-archive://suse-bci-base-15.4.27.14.99.tar WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process INFO: Starting build... 2024/01/31 09:59:35 info unpack layer: sha256:2677047460bef4ab87a080b30bac937bc8660ca49143a3f3efd5839002727428 2024/01/31 09:59:36 warn xattr{etc/shadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers" 2024/01/31 09:59:36 warn xattr{/users/schouoxv/build-temp-3908632222/rootfs/etc/shadow} destination filesystem does not support xattrs, further warnings will be suppressed WARNING: The --fix-perms option modifies the filesystem permissions on the resulting container. INFO: Creating sandbox directory... INFO: Build complete: suse-bci-base-15.4.27.14.99-cotainr-sandbox-from-docker schouoxv@uan01:~> singularity build suse-bci-base-15.4.27.14.99-cotainr-sandbox-from-docker.sif suse-bci-base-15.4.27.14.99-cotainr-sandbox-from-docker WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process INFO: Starting build... INFO: Creating SIF file... INFO: Build complete: suse-bci-base-15.4.27.14.99-cotainr-sandbox-from-docker.sif schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-cotainr-sandbox-from-docker/ ls -lh /var/lib/ca-certificates/openssl/002c0b4f.0 lrwxrwxrwx 1 schouoxv pepr_schouoxv 23 Sep 20 13:33 /var/lib/ca-certificates/openssl/002c0b4f.0 -> GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-cotainr-sandbox-from-docker/ ls -lh /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem -rw-r--r-- 1 schouoxv pepr_schouoxv 2.0K Sep 20 13:33 /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-cotainr-sandbox-from-docker.sif ls -lh /var/lib/ca-certificates/openssl/002c0b4f.0 lrwxrwxrwx 1 root root 23 Sep 20 13:33 /var/lib/ca-certificates/openssl/002c0b4f.0 -> GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-cotainr-sandbox-from-docker.sif ls -lh /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem -rw-r--r-- 1 root root 2.0K Sep 20 13:33 /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem ```

Adding a --fix-perms to the first conversion from docker archive to SIF, also seems to work and the problematic files get read-write permission.

Add --fix-perms flag details ```bash schouoxv@uan01:~> singularity build --fix-perms suse-bci-base-15.4.27.14.99-fix-perms.sif docker-archive://suse-bci-base-15.4.27.14.99.tar WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process INFO: Starting build... 2024/01/31 10:05:16 info unpack layer: sha256:2677047460bef4ab87a080b30bac937bc8660ca49143a3f3efd5839002727428 2024/01/31 10:05:16 warn xattr{etc/shadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers" 2024/01/31 10:05:16 warn xattr{/tmp/build-temp-1566219982/rootfs/etc/shadow} destination filesystem does not support xattrs, further warnings will be suppressed WARNING: The --fix-perms option modifies the filesystem permissions on the resulting container. INFO: Creating SIF file... INFO: Build complete: suse-bci-base-15.4.27.14.99-fix-perms.sif schouoxv@uan01:~> singularity build --sandbox --fix-perms suse-bci-base-15.4.27.14.99-fix-perms-cotainr-sandbox suse-bci-base-15.4.27.14.99-fix-perms.sif WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process INFO: Starting build... INFO: Verifying bootstrap image suse-bci-base-15.4.27.14.99-fix-perms.sif WARNING: integrity: signature not found for object group 1 WARNING: Bootstrap image could not be verified, but build will continue. INFO: Creating sandbox directory... INFO: Build complete: suse-bci-base-15.4.27.14.99-fix-perms-cotainr-sandbox schouoxv@uan01:~> singularity build suse-bci-base-15.4.27.14.99-fix-perms-cotainr-sandbox.sif suse-bci-base-15.4.27.14.99-fix-perms-cotainr-sandbox WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process INFO: Starting build... INFO: Creating SIF file... INFO: Build complete: suse-bci-base-15.4.27.14.99-fix-perms-cotainr-sandbox.sif schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-fix-perms.sif ls -lh /var/lib/ca-certificates/openssl/002c0b4f.0 lrwxrwxrwx 1 root root 23 Sep 20 13:33 /var/lib/ca-certificates/openssl/002c0b4f.0 -> GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-fix-perms.sif ls -lh /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem -rw-r--r-- 1 root root 2.0K Sep 20 13:33 /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-fix-perms-cotainr-sandbox ls -lh /var/lib/ca-certificates/openssl/002c0b4f.0 lrwxrwxrwx 1 schouoxv pepr_schouoxv 23 Sep 20 13:33 /var/lib/ca-certificates/openssl/002c0b4f.0 -> GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-fix-perms-cotainr-sandbox ls -lh /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem -rw-r--r-- 1 schouoxv pepr_schouoxv 2.0K Sep 20 13:33 /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-fix-perms-cotainr-sandbox.sif ls -lh /var/lib/ca-certificates/openssl/002c0b4f.0 lrwxrwxrwx 1 root root 23 Sep 20 13:33 /var/lib/ca-certificates/openssl/002c0b4f.0 -> GlobalSign_Root_R46.pem schouoxv@uan01:~> singularity exec suse-bci-base-15.4.27.14.99-fix-perms-cotainr-sandbox.sif ls -lh /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem -rw-r--r-- 1 root root 2.0K Sep 20 13:33 /var/lib/ca-certificates/openssl/GlobalSign_Root_R46.pem ```

This suggests a possible workaround: When converting the docker containers to the SIF images available on LUMI under /appl/local/containers/sif-images, we use the --fix-perms option. I will discuss this workaround with the LUMI container group.

I will also try to reproduce this error with the latest versions of SingularityCE and Apptainer. If it is an issue with these latest versions, I'll open an issue upstream about it.

jiemakel commented 5 months ago

Thanks for the digging. For now, this allows me to bypass the problem for myself by specifying --base-image=docker-archive:///appl/local/containers/docker-images/lumi-rocm-rocm-5.5.3.tar instead of the sif version.

Chroxvi commented 5 months ago

The fact that a direct conversion from a docker archive works, suggests another workaround: Use the docker archive directly as a base image with cotainr:

schouoxv@uan01:~/cotainr_dev/cotainr> module load cray-python
schouoxv@uan01:~/cotainr_dev/cotainr> ./bin/cotainr build test.sif --base-image=docker-archive:///appl/local/containers/tested-containers/lumi-rocm-rocm-5.5.3-dockerhash-e6d86d57bc53.tar --conda-env=./examples/LUMI/conda_pytorch_rocm/py311_rocm542_pytorch.yml
Cotainr:-: Creating Singularity Sandbox
SingularitySandbox.err:-: WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process
SingularitySandbox.err:-: WARNING: The --fix-perms option modifies the filesystem permissions on the resulting container.
Cotainr:-: Installing Conda environment: /pfs/lustrep2/users/schouoxv/cotainr_dev/cotainr/examples/LUMI/conda_pytorch_rocm/py311_rocm542_pytorch.yml

Welcome to Miniforge3 23.11.0-0

In order to continue the installation process, please review the license
agreement.

Miniforge installer code uses BSD-3-Clause license as stated below.

Binary packages that come with it have their own licensing terms
and by installing miniforge you agree to the licensing terms of individual
packages as well. They include different OSI-approved licenses including
the GNU General Public License and can be found in pkgs/<pkg-name>/info/licenses
folders.

Miniforge installer comes with a boostrapping executable that is used
when installing miniforge and is deleted after miniforge is installed.
The bootstrapping executable uses micromamba, cli11, cpp-filesystem,
curl, c-ares, krb5, libarchive, libev, lz4, nghttp2, openssl, libsolv,
nlohmann-json, reproc and zstd which are licensed under BSD-3-Clause,
MIT and OpenSSL licenses. Licenses and copyright notices of these
projects can be found at the following URL.
https://github.com/conda-forge/micromamba-feedstock/tree/master/recipe.

=============================================================================

Copyright (c) 2019-2022, conda-forge
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software without
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Do you accept the license terms? [yes|no]
>>> yes
CondaInstall.out:-: WARNING:
Cotainr:-: Cleaning up unused Conda files
Cotainr:-: Finished installing conda environment: /pfs/lustrep2/users/schouoxv/cotainr_dev/cotainr/examples/LUMI/conda_pytorch_rocm/py311_rocm542_pytorch.yml
Cotainr:-: Adding metadata to container
Cotainr:-: Building container image
SingularitySandbox.err:-: WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process
Cotainr:-: Finished building /pfs/lustrep2/users/schouoxv/cotainr_dev/cotainr/test.sif in 00:09:35

You may find the relevant docker archive based on the symlink of the SIF file:

schouoxv@uan01:~> ls -lh /appl/local/containers/sif-images/lumi-rocm-rocm-5.5.3.sif
lrwxrwxrwx 1 samantao project_462000394 117 Jan 24 17:48 /appl/local/containers/sif-images/lumi-rocm-rocm-5.5.3.sif -> /pfs/lustrep3/scratch/project_462000394/containers/tested-containers/lumi-rocm-rocm-5.5.3-dockerhash-e6d86d57bc53.si
schouoxv@uan01:~> ls -lh /appl/local/containers/tested-containers/lumi-rocm-rocm-5.5.3-dockerhash-e6d86d57bc53.*
-rwxr-xr-x 1 samantao project_462000394 2,8G Jan 24 17:45 /appl/local/containers/tested-containers/lumi-rocm-rocm-5.5.3-dockerhash-e6d86d57bc53.sif
-rw-rw-r-x 1 samantao project_462000394  12G Jan 24 17:46 /appl/local/containers/tested-containers/lumi-rocm-rocm-5.5.3-dockerhash-e6d86d57bc53.tar

@jiemakel You might want to try this workaround for now.

Chroxvi commented 5 months ago

Thanks for the digging. For now, this allows me to bypass the problem for myself by specifying --base-image=docker-archive:///appl/local/containers/docker-images/lumi-rocm-rocm-5.5.3.tar instead of the sif version.

Exactly the same thought :)

Chroxvi commented 3 months ago

The lumi-rocm-rocm-X.Y.Z.sif base images are now created with the --fix-perms flag when converting from a docker archive. This works around the problems in Singularity causing cotainr to fail. In other words, cotainr should now work out of the box with the official LUMI base images.