Closed adeandrade closed 2 years ago
From my experience usually most of the time is taken by mksquashfs
. You may also be getting rate limited by DockerHub. You can try switch to GitHub or GitLab container registry.
I have got mksquashfs to run a bit faster by downloading and building a newer version and making a wrapper script to set the amount of memory/number of CPUs. The detection of memory and CPUs by mksquashfs
is particularly bad if you are trying to run in a SLURM allocation since it will try and use the whole machine rather than the allocation and e.g. end up wapping. See https://github.com/frankier/csc-tricks/#and-fixing-mksquashfs-too
Just in case you are rebuilding your container to test every change, usually you can avoid this: https://frankie.robertson.name/research/effective-cluster-computing/#use-binds
Experiencing problem on all flash file systems. Local /tmp is 3X faster.
mksquashfs
isn't the bottleneck as it uses all CPUs on the node. The step before, which seems to involve hashing, runs on about 20% of a single CPU core.
The step before is extracting the OCI layers I think. You should absolutely set SINGULARITY_TMPDIR to fast local scratch storage if you would be using a network filesystem otherwise. The extraction creates many small files which is very slow on typical HPC network file systems.
This issue has been automatically marked as stale because it has not had activity in over 60 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
@frankier mksquashfs still uses only one CPU core on LUMI for me on a compute node, despite setting a custom executable binary as exec $SCRATCH/squashfs-tools/squashfs-tools/mksquashfs $@ -mem 1G -processors 16
with the latest squashfs-tools, and editing my path. Am I doing something wrong?
It has worked for me, but as ever with something a bit hacky like this it could break. You need to make sure your wrapper script is in the path before mksuqashfs. You can add add e.g. "echo 'using wrapper script'" to the beginning of your wrapper script to verify what's happening. Another thing to try is to echo the $PATH inside the sbatch script just before calling singularity pull. A more or less foolproof way of setting the path on the same line as calling singularity pull.
CSC is now using apptainer, so I'm not sure if there has been major architecture changes since the fork. Interested whether you can get it working. Also perhaps this issue would gain some traction on apptainer so the functionality could be added directly (assuming it hasn't already).
@ifelsefi I'm experiencing the same bottleneck, the storing signatures
is the culprit. Did you ever find a workaround to speedup?
This repository is closed. If you'd like a development team member to be involved, please run singularity --version
and if it says singularity-ce
submit a new issue to https://github.com/sylabs/singularity or otherwise submit a new issue to https://github.com/apptainer/apptainer.
Version of Singularity:
Expected behavior
Building a SIF file from a Docker image should take less time.
Actual behavior
It takes around 4 hours to build an image with 2 CPUs and 6 GB of RAM. The image is large (6 GB). The following warning is raised:
Steps to reproduce this behavior
Run:
What OS/distro are you running
How did you install Singularity
Provided by a Slurm cluster via Lmod.