NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
282 stars 31 forks source link

Task prolog script fails #83

Closed staeglis closed 2 years ago

staeglis commented 2 years ago

The task prolog scripts fails

$ srun -p cpu-test --container-image=centos --pty bash
pyxis: importing docker image: centos
pyxis: imported docker image: centos
slurmstepd-kistest: error: Could not run slurm task_prolog [/opt/slurm/task-prolog.sh]: No such file or directory
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)

I've tried to mount it via mount config but that fails

$ srun -p cpu-test --container-image=centos --pty bash
pyxis: importing docker image: centos
pyxis: imported docker image: centos
slurmstepd-kistest: error: pyxis: container start failed with error code: 1
slurmstepd-kistest: error: pyxis: printing enroot log file:
slurmstepd-kistest: error: pyxis:     enroot-mount: failed to mount: /opt/slurm at /home/user/.local/share/enroot/pyxis_117.0/opt/slurm: Not a directory
slurmstepd-kistest: error: pyxis: couldn't start container
slurmstepd-kistest: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1
slurmstepd-kistest: error: Failed to invoke spank plugin stack
srun: error: kistest: task 0: Exited with exit code 1
staeglis commented 2 years ago

I've fixed this via mount the file instead of the base directory:

/opt/slurm/task-prolog.sh       /opt/slurm/task-prolog.sh       none    x-create=file,bind,ro,nosuid,nodev,private                0   -1
rvencu commented 2 years ago

I've fixed this via mount the file instead of the base directory:

/opt/slurm/task-prolog.sh       /opt/slurm/task-prolog.sh       none    x-create=file,bind,ro,nosuid,nodev,private                0   -1

where is this going to, in the container image /etc/fstab ?