feedback on new podman module capability

lastephey commented 1 year ago

Feature request description

Thanks podman devs, especially @vrothberg and @rhatdan, for your work on implementing the new module capability.

We did a little initial testing and just wanted to share some feedback and ask a few questions. cc @scanon @danfulton

Our goal would be to use this instead of the custom module functionality we developed for podman-hpc. I'm working with our gpu and mpich modules as examples.

gpu testing

First, working from our gpu module, I did the following test.

podman pull docker.io/nvidia/cuda:12.2.0-devel-ubuntu20.04
export CONTAINERS_CONF=/global/homes/s/stephey/.config/containers/containers.conf.modules
stephey@perlmutter:login32:~/.config/containers> cat containers.conf.modules 
[containers]
mounts=[ 
    "type=glob,source=/usr/lib64/libnv*,destination=/usr/lib64,ro",
    "type=glob,source=/usr/lib64/libcuda*,destination=/usr/lib64,ro",
    "type=glob,source=/opt/cray/pe/mpich/default/gtl/lib/libmpi_gtl_cuda*,destination=/usr/lib64,ro",
    "type=bind,source=/usr/bin/nvidia-smi,destination=/usr/bin/nvidia-smi,ro",
    "type=bind,source=/dev/nvidiactl,destination=/dev/nvidiactl,ro",
    "type=glob,source=/dev/nvidia*,destination=/dev,ro",
    "type=bind,source=/dev/nvidia-uvm,destination=/dev/nvidia-uvm,ro",
    "type=bind,source=/dev/nvidia-uvm-tools,destination=/dev/nvidia-uvm-tools,ro"
]
env=["LD_LIBRARY_PATH=/usr/lib64"]
env=["MODULE_TEST=hello"]
env=["NVIDIA_VISIBLE_DEVICES"]
stephey@perlmutter:login32:~/.config/containers>

I've had trouble setting more than one environment variable. What is the right way to do this? I've tried

env=["LD_LIBRARY_PATH=/usr/lib64","MODULE_TEST=hello"]

and

env=["LD_LIBRARY_PATH=/usr/lib64"]
env=["MODULE_TEST=hello"]

but it seems like it only keeps the most recently set variable.

Feature request: make it possible to set more than one environment variable

Next, it's not clear to me how to copy in an environment variable that is set on the host into the container. For example, we'll want to copy in NVIDIA_VISIBLE_DEVICES from the host.

Feature request: make it possible to copy specific environment variables set on the host

Next, it's really great that mount has a glob option and I was pleased with how well it works, but to make sure the cuda driver works correctly, we also need to preserve the symlinks between the drivers. I don't know if that's possible with mount.

Feature request: add ability for mount to preserve symlinks.

Very basic gpu functionality seems to work while using the module:

stephey@perlmutter:login28:/global/common/shared/das/podman-4.7.0> podman run --rm -it docker.io/nvidia/cuda:12.2.0-devel-ubuntu20.04 bash
WARN[0000] Network file system detected as backing store.  Enforcing overlay option `force_mask="700"`.  Add it to storage.conf to silence this warning 
WARN[0000] Path "/etc/SUSEConnect" from "/etc/containers/mounts.conf" doesn't exist, skipping 
WARN[0000] Path "/etc/zypp/credentials.d/SCCcredentials" from "/etc/containers/mounts.conf" doesn't exist, skipping 

==========
== CUDA ==
==========

CUDA Version 12.2.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

root@12047d54b6b8:/# nvidia-smi
Fri Sep 15 23:24:36 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:C3:00.0 Off |                    0 |
| N/A   35C    P0    38W / 250W |  38472MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
root@12047d54b6b8:/#

I think the reason the CUDA Version is N/A is because we don't have the drivers properly symlinked. But it's not bad!

MPI thoughts

Next, working from our mpich module, I didn't test anything, but @scanon and I looked through containers.conf to see if it supports the capabilities we need.

Does containers.conf support setting the privileged flag? That is one of the capabilities we currently set in our mpich module. Apologies if it does and we missed it.

Feature request: support setting privileged flag

Is env-merge supported in containers.conf We'd need that to be able to append to LD_LIBRARY_PATH.

Feature request: support env-merge

Finally, another difficulty is that we initialize our shared-run mode for MPI in our current mpich module. I'm not sure if this would be possible in the podman module configuration.

General feedback

It would be really nice to have a way to easily toggle the individual modules on and off. For example, users might want both the gpu and mpich modules, but they may not want the cvmfs module that we also offer. Right now it seems like they could stack containers.module.conf files, but it would be really nice if they could do it via the CLI or in some other more user-friendly way. For example, podman run --module=foo --module=bar.

Hopefully this initial feedback is helpful. Thanks for your work on this.

Suggest potential solution

A clear and concise description of what you want to happen.

Have you considered any alternatives?

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

vrothberg commented 1 year ago

Thanks for the feedback, @lastephey !

I just returned from PTO but will go through the items before the meeting on Thursday.

vrothberg commented 1 year ago

Specifiying Env Variables

I've had trouble setting more than one environment variable. What is the right way to do this? I've tried

[containers]                                          
env=["LD_LIBRARY_PATH=/usr/lib64","MODULE_TEST=hello"]

It can be specified only once as an array. A podman --module=$PATH run alpine printenv shows the variables on the latest main branch for me.

vrothberg commented 1 year ago

Inject Host Env Variables

Next, it's not clear to me how to copy in an environment variable that is set on the host into the container. For example, we'll want to copy in NVIDIA_VISIBLE_DEVICES from the host.

Currently, there is only the --env-host flag which injects all env variables from the host into the container, potentially overriding the ones from the container.

AFAIUI, you are looking for a means to selectively inject a host env variales. --env-host=FOO --env-host=BAR would inject FOO and BAR from the host?

vrothberg commented 1 year ago

NOTE: we may need to break this one big issue into multiple smaller ones to be able to track them correctly. Let's continue the conversation for now.

vrothberg commented 1 year ago

Preserve Symlinks in Mounts

Next, it's really great that mount has a glob option and I was pleased with how well it works, but to make sure the cuda driver works correctly, we also need to preserve the symlinks between the drivers. I don't know if that's possible with mount.

If the destination of a symlink is copied along with the mount, then symlinks are being preserved, see:

podman (main) $ ls -la /tmp/link; cat /tmp/foo; podman run --mount type=bind,src=/tmp/link,dst=/tmp/link --mount type=bind,src=/tmp/foo,dst=/tmp/foo -it --rm --privileged alpine cat /tmp/link
lrwxrwxrwx. 1 vrothberg vrothberg 8 Sep 18 10:46 /tmp/link -> /tmp/foo
fromhost
fromhost

@lastephey, can you elaborate more on how the nature of the link?

vrothberg commented 1 year ago

--privileged field in containers.conf

Does containers.conf support setting the privileged flag? That is one of the capabilities we currently set in our mpich module. Apologies if it does and we missed it.

That does not exist at the moment, but we could add it. However, --privileged is a really really big hammer with respect to security. Can you elaborate on which parts of --privileged you need? If say, you need all capabilities, this can be modeled with containers.conf.

vrothberg commented 1 year ago

--env-merge field in containers.conf

Is env-merge supported in containers.conf We'd need that to be able to append to LD_LIBRARY_PATH.

No, that is not supported, so I think it's a good candidate to break it out. Can you elaborate on appending to LD_LIBRARY_PATH? How are you using --env-merge to append exactly? I am mostly curious but also want to make sure I understand the nuances.

vrothberg commented 1 year ago

It would be really nice to have a way to easily toggle the individual modules on and off. For example, users might want both the gpu and mpich modules, but they may not want the cvmfs module that we also offer. Right now it seems like they could stack containers.module.conf files, but it would be really nice if they could do it via the CLI or in some other more user-friendly way. For example, podman run --module=foo --module=bar.

That is exactly how we want it to be used.

One thing I just noticed is that podman run --module doesn't seem to load the modules correctly. Specifying --module before the run command loads it correctly, for instance:

podman (main) $ cat /tmp/vanilla.conf; cat /tmp/chocolate.conf; ./bin/podman --module=/tmp/chocolate.conf --module=/tmp/vanilla.conf run --rm alpine sh -c "/bin/printenv chocolate; grep vanilla /etc/resolv.conf"
[containers]
dns_searches=["vanilla.conf"]
[containers]
env=["chocolate=delicious"]
delicious
search vanilla.conf .

vrothberg commented 1 year ago

One more issue I found while playing: env_host=true doesn't seem to work.

vrothberg commented 1 year ago

One thing I just noticed is that podman run --module doesn't seem to load the modules correctly. Specifying --module before the run command loads it correctly, for instance:

Opened https://github.com/containers/podman/pull/20012 to address the issue.

I just noticed that this is issue #20000 :birthday:

vrothberg commented 1 year ago

One more issue I found while playing: env_host=true doesn't seem to work.

Opened https://github.com/containers/podman/pull/20014 to fix that.

lastephey commented 1 year ago

Thanks for all your comments @vrothberg. Sorry for the info dump- I'm happy to file separate feature requests to keep this more organized. I'll try to reply to your questions today or tomorrow.

lastephey commented 1 year ago

Inject Host Env Variables

AFAIUI, you are looking for a means to selectively inject a host env variales. --env-host=FOO --env-host=BAR would inject FOO and BAR from the host?

Yes, it would be nice to choose a specific host variable to inject (or a few variables, like you have demonstrated). Is that appropriate for a feature request?

lastephey commented 1 year ago

--env-merge field in containers.conf

Can you elaborate on appending to LD_LIBRARY_PATH? How are you using --env-merge to append exactly?

We would like to use it to prepend the path to an injected MPI library, for example. A user might already have a path LD_LIBRARY_PATH=/foo/bar set in their container, and we don't want to blow that away. We would rather prepend our MPI library location in front of their existing settings, something like

LD_LIBRARY_PATH=/path/to/mpi:/foo/bar

lastephey commented 1 year ago

--privileged field in containers.conf

However, --privileged is a really really big hammer with respect to security

We can figure out which settings we really need.

lastephey commented 1 year ago

Preserve Symlinks in Mounts

Here's an example.

I'm using this line in my module file to do the glob mount:

"type=glob,source=/usr/lib64/libcuda*,destination=/usr/lib64,ro",

Here are our target libraries on the host system:

stephey@perlmutter:login24:/usr/lib64> ls -la | grep libcuda
lrwxrwxrwx  1 root root        29 Mar 28 20:44 libcudadebugger.so.1 -> libcudadebugger.so.525.105.17
-rwxr-xr-x  1 root root  10490248 Mar 28 20:44 libcudadebugger.so.525.105.17
lrwxrwxrwx  1 root root        12 Mar 28 20:44 libcuda.so -> libcuda.so.1
lrwxrwxrwx  1 root root        21 Mar 28 20:44 libcuda.so.1 -> libcuda.so.525.105.17
-rwxr-xr-x  1 root root  29867944 Mar 28 20:44 libcuda.so.525.105.17
stephey@perlmutter:login24:/usr/lib64>

Here are the mounted libraries inside the container running with my gpu module:

root@8dd9e54b936a:/usr/lib64# ls -la | grep libcuda
-rwxr-xr-x  1 nobody nogroup  29867944 Mar 29 03:44 libcuda.so
-rwxr-xr-x  1 nobody nogroup  29867944 Mar 29 03:44 libcuda.so.1
-rwxr-xr-x  1 nobody nogroup  29867944 Mar 29 03:44 libcuda.so.525.105.17
-rwxr-xr-x  1 nobody nogroup  10490248 Mar 29 03:44 libcudadebugger.so.1
-rwxr-xr-x  1 nobody nogroup  10490248 Mar 29 03:44 libcudadebugger.so.525.105.17
root@8dd9e54b936a:/usr/lib64#

We'd like the existing symlinks to be preserved.

lastephey commented 1 year ago

@scanon has some additional feedback about stacking modules and making sure they don't overwrite each other (i.e. environment variable settings). He can give you more details tomorrow.

Thanks again for looking into this so far @vrothberg! 🙏

rhatdan commented 1 year ago

How about something like:

LD_LIBRARY_PATH=/path/to/mpi:$LD_LIBRARY_PATH

rhatdan commented 1 year ago

I think having a Privileged=true flag makes sense.

vrothberg commented 1 year ago

OK, I can reproduce when using --type=glob as well. Using --type=bind and mounting the entire directory preserves the links.

@giuseppe is this something the runtimes are doing?

giuseppe commented 1 year ago

I had a look at it and I don't see a way to achieve it, the kernel always follows the symlinks for the mount source so there is no way to bind mount a symlink.

What we could do is to "copy" these symlinks and create them manually in the container rootfs

vrothberg commented 1 year ago

I had a look at it and I don't see a way to achieve it, the kernel always follows the symlinks for the mount source so there is no way to bind mount a symlink.

What we could do is to "copy" these symlinks and create them manually in the container rootfs

I created a separate issue for the symlink issue: https://github.com/containers/podman/issues/20098

vrothberg commented 1 year ago

@lastephey @rhatdan @Luap99 I created a POC allowing for appending arrays in containers.conf: https://github.com/containers/common/pull/1675

vrothberg commented 1 year ago

As discussed yesterday, injecting host variables works already (I had no idea):

~ $ FOO=123 podman run --rm --env FOO alpine printenv FOO
123

vrothberg commented 1 year ago

https://github.com/containers/podman/pull/20252 added support for the privileged field in containers.conf (and Podman).

vrothberg commented 1 year ago

https://github.com/containers/podman/pull/20463 enabled the append logic for env, volumes and mounts.

vrothberg commented 1 year ago

https://github.com/containers/podman/pull/20483 complets the append logic for all string arrays in containers.conf

containers / podman