NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
266 stars 30 forks source link

When running srun job with pyxis, I get failed to execute: /bin/bash #61

Closed JonShelley closed 2 years ago

JonShelley commented 2 years ago

When running a 2 VM job using srun, enroot, and pyxis I see the following error

pyxis: importing docker image ... pyxis: importing docker image ... slurmstepd: error: pyxis: container start failed with error code: 1 slurmstepd: error: pyxis: printing contents of log file ... slurmstepd: error: pyxis: enroot-switchroot: failed to execute: /bin/bash: No such file or directory slurmstepd: error: pyxis: couldn't start container slurmstepd: error: pyxis: if the image has an unusual entrypoint, try using --no-container-entrypoint slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1 slurmstepd: error: Failed to invoke spank plugin stack srun: error: hpc-pg0-2: task 1: Exited with exit code 1 slurmstepd: error: pyxis: container start failed with error code: 1 slurmstepd: error: pyxis: printing contents of log file ... slurmstepd: error: pyxis: enroot-switchroot: failed to execute: /bin/bash: No such file or directory slurmstepd: error: pyxis: couldn't start container slurmstepd: error: pyxis: if the image has an unusual entrypoint, try using --no-container-entrypoint slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1 slurmstepd: error: Failed to invoke spank plugin stack srun: error: hpc-pg0-1: task 0: Exited with exit code 1

Any ideas what would cause this?

flx42 commented 2 years ago

Hello,

Could you test this container image but without pyxis? i.e. with enroot import, enroot create and enroot start.

JonShelley commented 2 years ago

I have tried to reproduce and have been unsuccessful. Thinks appear to be working now so I am closing the issue.