RFE: System site policy defaults

SEJeff commented 5 years ago

/kind feature

Description This ticket is requesting the ability for System Administrators to set site policy defaults. This functionality improves the workflow for HPC research using podman. Here are some use cases that come to mind which could be covered by this functionality.

Default bind mounts into a container. The containers-mounts(5) is a nice and elegant solution for small files, but also copies the files into the container as a security feature. In many cases, the data is large (many Gb in some cases) and copying it for the lifecycle of an ephemeral compute job is not ideal. Perhaps a flag could be added to mounts.conf to perform a read only bind mount into the container?
- For applications that need to use the GPU, the host nvidia libraries should be default bind mounted into the container.
- For applications that need to use raw Infiniband libverbs, the host mellanox and ib libraries should be bind mounted into the container.
An ability to blacklist paths which should not be mounted ever, even if requested via podman run -v:
- autofs paths
- /var/run/nscd.socket
Certain environment variables need to be set for all containerized workload that are also set on the host systems. Much of this is for internal CA certificates. (podman-container-runlabel(1) might help with this a bit, but we'd love a default we can use for everything here. Users might build their own OCI images and might not run podman container runlabel)
- NODE_EXTRA_CA_CERTS=/etc/pki/tls/cert.pem
- NODE_OPTIONS=--use-openssl-ca
- PIP_CERT=/etc/pki/tls/cert.pem
- REQUESTS_CA_BUNDLE=/etc/pki/tls/cert.pem
- SSL_CERT_FILE=/etc/pki/tls/cert.pem

The overall goal of this is to make podman run better out of the box for HPC workloads where there are a whole bunch of defaults that have to be added to run successfully.

Users running podman for their jobs might not always know the correct flags or environment variables necessary to run their jobs. This RFE is for making podman excellent for their use case.

mheon commented 5 years ago

For bind mounts, I'd prefer to just extend mounts.conf to allow flipping a switch to bind-mount instead of copy.

For blacklisting: my initial concern would be - how much effort do we want to put in here? Is it sufficient to block explicit mounts with the /var/run/ncsd.socket source, or do we need to scan all mounts for parents of that and mask the path? We can do that (the OCI runtime lets us mask paths), but detecting cases where we need to could be complex and potentially expensive.

Default environment variables: my initial reaction would be load those into a file and use runlabels with --env-file, but I'm not really opposed to adding a way of setting default environment variables. Only question is where - I don't really know if this seems like a libpod.conf thing and I'm hesitant to introduce another config file...

Maybe we can add an option to base a container's spec off a given OCI spec? We use the runtime-tools generator right now which includes a default spec, but there's no reason we can't source that default from elsewhere.

SEJeff commented 5 years ago

@mheon Happy to go with whatever you think makes sense to do this. I'd also be ok with a way to simply not allow users to do any mounts not already provided in the mounts.conf. That would certainly alleviate the difficulty of scanning all mounts for parents perhaps?

We have defaults we'd like to provide, and we have things we'd like to not be possible. How that is achieved is up for discussion. We thought reaching out to upstream vs writing and maintaining a huge wrapper made more sense. We spoke with Scott and Dan Walsh about this via VC (and are redhat customers).

mheon commented 5 years ago

I'll just caution that, with rootless, some restrictions become hard to enforce. Everything rootless Podman does can be done by the user running the command; we're not SUID, we have no special file capabilities. As such, everything Podman itself touches is owned by the user, can be directly modified by the user, etc... If they decide to edit the OCI spec that Podman generates to include extra mounts, for example, we can't stop that. I can certainly make that a lot harder to do, and prevent any accidental instances of mounting these things in, but a determined attacker can easily sidestep my precautions (for example, just invoking runc manually - pretty trivial to do, absolute control of the spec, etc).

SEJeff commented 5 years ago

@mheon Yeah for our use case I think that's acceptable. Is that acceptable for you?

We want to make running specific research workloads in podman trivial. This mostly is a bunch of defaults with likely some mounts.conf love along with maybe some podman container runlabel docs.

mheon commented 5 years ago

I'm fine with making it possible, as long as we make it explicit that the security isn't particularly strong - we're just providing ease-of-use defaults.

SEJeff commented 5 years ago

That's precisely what we're going for. How would you envision this being implemented then?

mheon commented 5 years ago

I'm thinking that our first step is implementing the ability to set a "base config" to build containers from. That gets us environment variables and mounts.

From there, we can figure out what excluding mounts should look like. That will probably be more challenging.

SEJeff commented 5 years ago

@mheon Would it be possible to do this without building containers?

One of our use cases that caused us to reach out to Redhat (via our TAM) was that we'd love to see a way to use podman to run the upstream tensorflow nightly from docker hub. If we could set some of these defaults via config files (like mounts.conf), it would allow us to potentially run the upstream tensorflow image unmodified using podman's excellent rootless support.

Is that too ambitious without us having a pipeline where we modify images and then store them locally with mods? That's an option as well.

mheon commented 5 years ago

This will probably take the form of a "config file" of sorts.

To run containers, Podman creates an OCI spec, which we then pass to an OCI runtime to run the container. The spec defines things like mounts, environment variables, etc. We start with a default spec provided by https://github.com/opencontainers/runtime-tools/ and then modify it based on the command line options from create/run - I'm thinking that we'll allow changing out that default spec to a user-provided one, which we can ship as a sort of configuration file.

SEJeff commented 5 years ago

Yes that sounds perfect.

mheon commented 5 years ago

I'm going to self-assign this. Hoping to squeeze in the default spec work sometime next week.

rhatdan commented 5 years ago

@mheon any progress on this?

mheon commented 5 years ago

Priority on this slipped a bit. It's sitting somewhere after IPv6 support and NFS volumes in my list. If it should be higher, I can bump it up before one of those two.

github-actions[bot] commented 5 years ago

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

mheon commented 5 years ago

Most of this is going to land with the containers.conf work, I think

vrothberg commented 5 years ago

An ability to blacklist paths which should not be mounted ever, even if requested via podman run -v:

@rhatdan, @QiWang19, is this something you want to work on/combine with the containers.conf work? The request makes sense to me but we need to carefully review how to do the plumbing into buildah, podman and CRI-O (Cc @mrunalp).

rhatdan commented 5 years ago

Yes. I am adding a label for containers.conf.

adrianreber commented 4 years ago

Not sure if the following is also part of this issue. When I have been running Podman in an HPC like use case like described in https://podman.io/blogs/2019/09/26/podman-in-hpc.html it would be good to also have the possibility to control things like which namespace Podman should use as default. For my Open MPI based examples I always have to specifiy --userns=keep-id --net=host --pid=host --ipc=host. If that could be set globally that would also help a lot.

rhatdan commented 4 years ago

Still waiting on containers.conf

vrothberg commented 4 years ago

@rhatdan, I guess we can close, right?

rhatdan commented 4 years ago

containers.conf is now implemented, should fix this issue.

containers / podman

RFE: System site policy defaults #3587