NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
287 stars 30 forks source link

user supplemental groups do not show up in session, in container #152

Open EternalTB opened 1 day ago

EternalTB commented 1 day ago

simple problem.. likely complex solution. When instantiating a container:

hmeses@foo:~$ srun --nodelist=dc065b3c-123 --cpus-per-gpu 10 --gpus 1 --time 16:00:00 --mem-per-gpu 120G --partition priority --export ALL,ENV_COLOR=green --container-image /mnt/weka/foo/foo.sqsh --container-mounts /mnt/weka:/mnt/weka,/dev/infiniband:/dev/infiniband,/usr/bin/tclsh:/usr/bin/tclsh,/usr/lib/x86_64-linux-gnu/libtcl8.6.so:/usr/lib/x86_64-linux-gnu/libtcl8.6.so,/usr/share/tcltk/tcl8.6/init.tcl:/usr/share/tcltk/tcl8.6/init.tcl,/cm/local/apps/environment-modules:/cm/local/apps/environment-modules/,/cm/local/modulefiles:/cm/local/modulefiles,/cm/shared/modulefiles:/cm/shared/modulefiles,/cm/shared/apps/slurm/:/cm/shared/apps/slurm,/usr/lib/x86_64-linux-gnu/libmunge.so.2:/usr/lib/x86_64-linux-gnu/libmunge.so.2,/var/run/munge:/var/run/munge --container-workdir /home/hmeses id uid=243000521(hmeses) gid=243000521(hmeses) groups=243000521(hmeses),65534(nogroup)

hmeses@foo:~$ srun --nodelist=dc065b3c-123 --cpus-per-gpu 10 --gpus 1 --time 16:00:00 --mem-per-gpu 120G --partition priority --export ALL,ENV_COLOR=green --container-image /mnt/weka/foo/foo.sqsh --container-mounts /mnt/weka:/mnt/weka,/dev/infiniband:/dev/infiniband,/usr/bin/tclsh:/usr/bin/tclsh,/usr/lib/x86_64-linux-gnu/libtcl8.6.so:/usr/lib/x86_64-linux-gnu/libtcl8.6.so,/usr/share/tcltk/tcl8.6/init.tcl:/usr/share/tcltk/tcl8.6/init.tcl,/cm/local/apps/environment-modules:/cm/local/apps/environment-modules/,/cm/local/modulefiles:/cm/local/modulefiles,/cm/shared/modulefiles:/cm/shared/modulefiles,/cm/shared/apps/slurm/:/cm/shared/apps/slurm,/usr/lib/x86_64-linux-gnu/libmunge.so.2:/usr/lib/x86_64-linux-gnu/libmunge.so.2,/var/run/munge:/var/run/munge --container-workdir /home/hmeses id hmeses uid=243000521(hmeses) gid=243000521(hmeses) groups=243000521(hmeses),243000520(ug_forge_foo)

So, the supplemental groups are added into /etc/group and /etc/group- via an alteration in 10-shadow.sh, which is why the second query works. However since the session doesn't have the supplemental group(s), group perms do not work for commands.

I cannot do a workaround, unless it solves this issue. In other words, "id" HAS to produce the supplemental groups. I am fine doing any sort of configuration in enroot, slurm or pyxis to make this happen, however I cannot seem to be able to find a way to do this. "su -" inside the container doesn't work (and yes I did set the password so it would succeed), as I am sure experienced pyxis users would know. Nor does using setpriv in the srun command. And I've checked srun cold, without container instantiation, and it does return the supplemental groups:

hmeses@foo:~$ srun --nodelist=dc065b3c-123 --cpus-per-gpu 10 --gpus 1 --time 16:00:00 --mem-per-gpu 120G --partition priority --export ALL,ENV_COLOR=green id uid=243000521(hmeses) gid=243000521(hmeses) groups=243000521(hmeses),243000520(ug_forge_foo)

I have tried a dozen methods, and all have failed. Clearly, I do not understand precisely how the groups are set in session, and I do not see anything in the docs about changing this behavior. Is there a way? Am I missing something?

flx42 commented 4 hours ago

As far as I know this is a consequence of the user namespace remapping but this should not have no functional impact, and you should have the same permissions as outside the container.

Containerized process:

$ srun --no-container-remap-root --container-image=ubuntu:24.04 bash -c 'echo $$ ; sleep 120'
pyxis: importing docker image: ubuntu:24.04
pyxis: imported docker image: ubuntu:24.04
322964

$ grep Groups /proc/322964/status
Groups: 65541 65547 65548 65552 65555 65556 65557 65558 65563 65565 295394 2000000513 2000330968 2001026077 2001028010 2001028012 2001045249 2001045250 2001054298 2001054299 2001055536

Non-containerized process:

$ srun bash -c 'echo $$ ; sleep 120'
324840

$ grep Groups /proc/324840/status
Groups: 65541 65547 65548 65552 65555 65556 65557 65558 65563 65565 295394 2000000513 2000330968 2001026077 2001028010 2001028012 2001045249 2001045250 2001054298 2001054299 2001055536

Is there a specific example where this is causing a problem?