NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
259 stars 28 forks source link

mkdir Permission denied #138

Open proshir opened 1 month ago

proshir commented 1 month ago

Hi, I used to use enroot and pyxis in slurm, but unfortunately my settings got corrupted and I get the following error. Can you help me please?

 error: [job 1537] prolog failed status=126:0
 error: pyxis: child 767229 failed with error code: 1
 error: pyxis: couldn't execute enroot command
 error: pyxis: printing enroot log file:
 error: pyxis:     mkdir: cannot create directory '/raid': Permission denied
 error: pyxis:     mkdir: cannot create directory '/tmp/enroot-data': Permission denied
 error: pyxis:     mkdir: cannot create directory '/run/enroot': Permission denied
 error: pyxis: couldn't get list of existing containers
 error: pyxis: couldn't cleanup pyxis containers for job 1537
 error: spank: required plugin spank_pyxis.so: job_epilog() failed with rc=-1
 error: spank/epilog returned status 0x0100
 error: /etc/slurm/epilog.sh: exited with status 0x7e00
 error: [job 1537] epilog failed status=126:0

Also, my enroot.conf file is as follows:

ENROOT_RUNTIME_PATH /run/enroot/user-$(id -u)
ENROOT_CACHE_PATH /raid/enroot-cache/group-$(id -g)
ENROOT_DATA_PATH /tmp/enroot-data/user-$(id -u)
flx42 commented 1 month ago

Do these directories exist on the compute node? enroot/pyxis will run as unprivileged, so if you want to use folders like /raid and /run/enroot, you need to make sure that they are created at boot time or during the job prolog.

Not sure why /tmp/enroot-data is failing however, maybe it already exists but it's not accessible to the user?

jclinton830 commented 4 weeks ago

I am having the same problem. I have a prolog task doing mkdir and chown but still gives the same error.

jjustin@diana:/etc/enroot$ srun -w hades --container-image ubuntu cat /etc/os-release
pyxis: importing docker image: ubuntu
slurmstepd-hades: error: pyxis: child 611187 failed with error code: 1
slurmstepd-hades: error: pyxis: failed to import docker image
slurmstepd-hades: error: pyxis: printing enroot log file:
slurmstepd-hades: error: pyxis:     mkdir: cannot create directory ‘/raid’: Permission denied
slurmstepd-hades: error: pyxis:     mkdir: cannot create directory ‘/tmp/enroot-data’: Permission denied
slurmstepd-hades: error: pyxis:     mkdir: cannot create directory ‘/run/enroot’: Permission denied
slurmstepd-hades: error: pyxis: couldn't start container
slurmstepd-hades: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1
slurmstepd-hades: error: Failed to invoke spank plugin stack
srun: error: hades: task 0: Exited with exit code 1