NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
281 stars 31 forks source link

Updated Slurm + Pyxis, now entrypoint is `/` instead of current location. #119

Open crinavar opened 1 year ago

crinavar commented 1 year ago

Hi community, We recently (yesterday) updated Slurm to 23.02 and Pyxis to the latest git release 0.15, and we noted that now when running jobs with containers, the entrypoint is switched to /. Like this:

➜  ~ pwd
/home/cnavarro
➜  ~ srun --container-name=cuda-11.4.2 --gpus=8 --pty bash
cnavarro@nodeGPU01:/$ pwd
/
cnavarro@nodeGPU01:/$ 

Before, running the same command would make us stay in /home/cnavarro, and in general whatever specific location inside home we where executing from., This feature was very comfortable for users (despite the warnings on doing this). Currently enroot is mounting the user's home as intended, and this is our current pyxis config

➜  ~ cat /etc/slurm/plugstack.conf.d/pyxis.conf
required /usr/local/lib/slurm/spank_pyxis.so container_scope=global

we tried using both execute_entrypoint=0 and execute_entrypoint=1 but we have not yet recovered the wanted behavior. Any ideas on what are we missing? many thanks

flx42 commented 1 year ago

So this is unrelated to container entrypoints. What was your previous pyxis version? I think the default behavior changed a while ago: it is now always using the workdir specified in the container. The conflict is that some containers set the WORKDIR to a directory where the scripts to run are.

But I realize I did not document this, sorry about that.

I'll check if I there is something I can do to make this better, for now you can still use something like:

$ srun --container-image ubuntu:22.04 --container-mount-home --no-container-remap-root pwd
pyxis: importing docker image: ubuntu:22.04
pyxis: imported docker image: ubuntu:22.04
/

$ srun --container-image ubuntu:22.04 --container-mount-home --no-container-remap-root --container-workdir=${PWD} pwd
pyxis: importing docker image: ubuntu:22.04
pyxis: imported docker image: ubuntu:22.04
/home/fabecassis/github/pyxis
crinavar commented 1 year ago

Hi flx42, Thanks for correcting, yes it is about workdirs. Previous pyxis version I think was around ~2 years ago, we had not updated for a while. The current workaround works great for now. Do you have in mind adding some new option or config argument for pyxis?

flx42 commented 1 year ago

Do you have in mind adding some new option or config argument for pyxis?

I'm thinking that if the container workdir is just / then it's probably the default settings, and then I should try switching to the job workdir instead.

crinavar commented 1 year ago

Sounds great,

flx42 commented 1 year ago

I did the work in https://github.com/NVIDIA/pyxis/tree/job-cwd, but I'm not sure yet if I'm going to merge it. By default enroot will create a HOME directory for the user even if you remap root, so the patch would land you into the wrong directory if you are currently in your HOME, for example:

$ cd ~

$ pwd
/home/fabecassis

$ srun --container-image ubuntu:22.04 --container-mount-home bash --norc -xc 'pwd ; ls -la .'
pyxis: importing docker image: ubuntu:22.04
pyxis: imported docker image: ubuntu:22.04
/home/fabecassis
+ pwd
+ ls -la .
total 20
drwxr-xr-x 2 root root 4096 Aug  3 19:03 .
drwxr-xr-x 3 root root 4096 Aug 17 19:15 ..
-rw-r--r-- 1 root root  220 Jan  6  2022 .bash_logout
-rw-r--r-- 1 root root 3771 Jan  6  2022 .bashrc
-rw-r--r-- 1 root root  807 Jan  6  2022 .profile

I will discuss with my colleagues if we can have enroot avoid creating an empty HOME directory in this particular case.