NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
281 stars 31 forks source link

SLURM enter into a running container using overlap and container-name, mounted path is empty #130

Closed itzsimpl closed 10 months ago

itzsimpl commented 10 months ago

I have the following scenario:

$ mkdir testmount
$ touch testmount/dummyfile
$ srun --container-image=ubuntu:22.04 --container-name=test --container-mounts=./testmount:/testmount --pty bash
root@node:/# ls /testmount/
dummyfile
root@node:/#

Let's assume the jobid is 960. If I open another bash shell and want to execute the same command in the same container (basically attach to it) I can run the following

$ srun --overlap --jobid 960 --pty bash
user@node:~ [bash]$ enroot list -f
NAME            PID    COMM  STATE  STARTED  TIME   MNTNS       USERNS      COMMAND
pyxis_960_test  48755  bash  Ss+    10:47    12:18  4026538274  4026538273  /usr/bin/bash
user@node:~ [bash]$ enroot exec 48755 bash
root@node:/# ls testmount/
dummyfile
root@node:/#

Pyxis provides the convenient --container-name, by means of which I can reduce the number of commands to one and fall straight into the container, but unfortunately this time the mounted path is empty.

$ srun --overlap --jobid 960 --container-name=test --pty bash
root@node:/# ls testmount/
root@node:/#

Opening a third terminal to the node and listing the current enroot containers, there is only one; so what could be the cause that in the latter case the mounted path is empty?

$ enroot list -f
NAME            PID    COMM  STATE  STARTED  TIME   MNTNS       USERNS      COMMAND
pyxis_960_test  48755  bash  Ss+    10:47    18:45  4026538274  4026538273  /usr/bin/bash
flx42 commented 10 months ago

Which pyxis version are you using?

Also, are you perhaps using job_container/tmpfs? The problem might the interaction between pyxis and this plugin.

itzsimpl commented 10 months ago

@flx42 no, I do not use job_container/tmpfs, i.e. it is not set the slurm.conf, although PrologFlags=Alloc,Serial,Contain. Slurm was setup using deepops, with only minor modifications. Pyxis is 0.16.1, enroot 3.4.1, slurm 23.02.6.

Since I, myself, cannot recreate what I described in the issue anymore I'll just go ahead and close it. I'll reopen it if it occurs again.

Thank you for the help, and I appologise for the inconvenience.