Closed rloewe closed 1 year ago
The error message is:
WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process
INFO: Starting build...
INFO: Verifying bootstrap image pytorch_23.02-py3.sif
WARNING: integrity: signature not found for object group 1
WARNING: Bootstrap image could not be verified, but build will continue.
ERROR: unpackSIF failed: root filesystem extraction failed: extract command failed: WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating
WARNING: Skipping mount /etc/hosts [binds]: /etc/hosts doesn't exist in container
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount proc [kernel]: /proc doesn't exist in container
WARNING: Skipping mount /usr/local/var/singularity/mnt/session/tmp [tmp]: /tmp doesn't exist in container
WARNING: Skipping mount /usr/local/var/singularity/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
WARNING: Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
: signal: killed
FATAL: While performing build: packer failed to pack: root filesystem extraction failed: extract command failed: WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating
WARNING: Skipping mount /etc/hosts [binds]: /etc/hosts doesn't exist in container
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount proc [kernel]: /proc doesn't exist in container
WARNING: Skipping mount /usr/local/var/singularity/mnt/session/tmp [tmp]: /tmp doesn't exist in container
WARNING: Skipping mount /usr/local/var/singularity/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
WARNING: Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
: signal: killed
Traceback (most recent call last):
File "/home/other-repo/cotainr/bin/cotainr", line 14, in <module>
sys.exit(main())
File "/home/other-repo/cotainr/cotainr/cli.py", line 390, in main
cli.subcommand.execute()
File "/home/other-repo/cotainr/cotainr/cli.py", line 141, in execute
with container.SingularitySandbox(base_image=self.base_image) as sandbox:
File "/home/other-repo/cotainr/cotainr/container.py", line 73, in __enter__
self._subprocess_runner(
File "/home/other-repo/cotainr/cotainr/container.py", line 225, in _subprocess_runner
return util.stream_subprocess(args=args, **kwargs)
File "/home/other-repo/cotainr/cotainr/util.py", line 113, in stream_subprocess
completed_process.check_returncode()
File "/home//miniconda3/lib/python3.9/subprocess.py", line 460, in check_returncode
raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['singularity', 'build', '--force', '--sandbox', PosixPath('/tmp/tmphxw8alnl/singularity_sandbox'), 'pytorch_23.02-py3.sif']' returned non-zero exit status 255.
I am not able to reproduce this problem. I have tried on LUMI using cotainr/2023.01.1 and singularity-ce/3.11.1 and on my laptop using cotainr/main (https://github.com/DeiC-HPC/cotainr/commit/4c81aa53a8760c184a925038b34fe0be18ce4277). In both cases the container builds without problems.
Looking at the error message, I notice the two Singularity errors:
ERROR: unpackSIF failed: root filesystem extraction failed: extract command failed: WARNING: passwd file doesn't exist in container, not updating
FATAL: While performing build: packer failed to pack: root filesystem extraction failed: extract command failed: WARNING: passwd file doesn't exist in container, not updating
These look like problems in older versions of Singularity, e.g. https://github.com/apptainer/singularity/issues/5666 or https://github.com/apptainer/singularity/issues/5690.
@ThomasA Are you still able to reproduce this problem? If so, what versions of cotainr and apptainer/singularity are you running?
I was trying with Cotainr 2023.01.0. I am checking now if I can still reproduce it. Afterwards I will try 2023.02.0. I suspect that the base image I am trying somehow does not support what Cotainr/Singularity is trying to do with it?
I can actually build the container now with 2023.01.0. I cannot rule out entirely that I may have been using an earlier version of Cotainr initially. In any case, it does not seem to be a problem now.
Good to hear that it work for you now! I will close this issue.
Running this command
cotainr build --base-image docker://nvcr.io/nvidia/pytorch:23.02-py3 --conda-env accelerate.yml accelerate.sif
fails with an error.