Closed vinayburugu closed 8 months ago
Can you try with --container-image ./hello-world.sqsh
?
You have to start the path with /
or ./
to use a squashfs image, otherwise it's considered as a docker image from a registry.
I do have the /home/ubuntu/hello-world.sqsh file but still seeing no such file error.
/opt/slurm/bin/srun --container-image /home/ubuntu/hello-world.sqsh hostname
slurmstepd: error: pyxis: child 10261 failed with error code: 1
slurmstepd: error: pyxis: failed to create container filesystem
slurmstepd: error: pyxis: printing enroot log file:
slurmstepd: error: pyxis: [ERROR] No such file or directory: /home/ubuntu/hello-world.sqsh
slurmstepd: error: pyxis: couldn't start container
slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1
slurmstepd: error: Failed to invoke spank plugin stack
srun: error: compute-0: task 0: Exited with exit code 1
That should work, is /home/ubuntu
mounted on all nodes?
If you are running the srun
from a Slurm login node, then perhaps the hello-world.sqsh
file is only present on the login node and not on the other nodes?
moved the file to shared file system. Still seeing task_init() error. Tried restarting slurmctld and slrumd.
/opt/slurm/bin/srun --container-image /mnt/shared/hello-world.sqsh hostname
slurmstepd: error: pyxis: container start failed with error code: 1 slurmstepd: error: pyxis: printing enroot log file: slurmstepd: error: pyxis: enroot-switchroot: failed to execute: /bin/sh: No such file or directory slurmstepd: error: pyxis: couldn't start container slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1 slurmstepd: error: Failed to invoke spank plugin stack srun: error: compute-0: task 0: Exited with exit code 1
Is it the DockerHub hello-world
image? https://hub.docker.com/_/hello-world
If so, it is a FROM scratch
image so it just has one binary inside it: https://github.com/docker-library/hello-world/blob/3fb6ebca4163bf5b9cc496ac3e8f11cb1e754aee/amd64/hello-world/Dockerfile
Try an ubuntu image instead.
It worked @flx42 . Thank you
Hi I installed pyxis and enroot and facing issues in running srun with *.sqsh image. Can anyone identify the cause for this?
/opt/slurm/bin/srun --container-image hello-world.sqsh hostname
pyxis: importing docker image: hello-world.sqsh slurmstepd: error: pyxis: child 10028 failed with error code: 1 slurmstepd: error: pyxis: failed to import docker image slurmstepd: error: pyxis: printing enroot log file: slurmstepd: error: pyxis: [INFO] Querying registry for permission grant slurmstepd: error: pyxis: [INFO] Authenticating with user:
slurmstepd: error: pyxis: couldn't start container
slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1
slurmstepd: error: Failed to invoke spank plugin stack
srun: error: compute-0: task 0: Exited with exit code 1
cat /opt/slurm/etc/plugstack.conf include /opt/slurm/etc/plugstack.conf.d/*
cat /opt/slurm/etc/plugstack.conf.d/pyxis.conf required /usr/local/lib/slurm/spank_pyxis.so