containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.45k stars 2.38k forks source link

Volume for container with custom user is not writeable #3990

Closed abitrolly closed 4 years ago

abitrolly commented 5 years ago

/kind bug

Description

When running a container with explicitly set user, such as https://github.com/schemaspy/schemaspy/blob/a28c9fc932cc6f85c7780050a678b3a3d7f595e9/Dockerfile#L44 the volume mounted by podman is not writeable.

Steps to reproduce the issue:

  1. Get some PostreSQL host (192.168.4.1) and port (5432)

  2. Create dir to mount as a volume

    $ mkdir html
  3. Run podman

    $ podman run -it -v "$PWD"/html:/output:Z schemaspy/schemaspy:snapshot -u postgres -t pgsql11 -host 192.168.4.1 -port 5432 -db anitya
    ...
    INFO  - Starting Main v6.1.0-SNAPSHOT on 9090a61652af with PID 1 (/schemaspy-6.1.0-SNAPSHOT.jar started by java in /)
    INFO  - The following profiles are active: default
    INFO  - Started Main in 2.594 seconds (JVM running for 3.598)
    INFO  - Starting schema analysis
    ERROR - IOException
    Unable to create directory /output/tables
    INFO  - StackTraces have been omitted, use `-debug` when executing SchemaSpy to see them

Describe the results you received:

Container is unable to write to /output, most likely because it is running with java user.

Describe the results you expected:

Volumes work rw regardless of user settings inside of conainer.

Output of podman version:

✗ podman version
Version:            1.5.1
RemoteAPI Version:  1
Go Version:         go1.12.7
OS/Arch:            linux/amd64
mheon commented 5 years ago

I assume the -u postgres in here means that your app in the container isn't running as root?

As such, it's running as a non-0 user in the container, which is mapped to a user on the host through /etc/subuid (root in a rootless container is the user that started the container, all higher UIDs and GIDs are mapped to a block on the host given by /etc/subuid and /etc/subgid). The volume looks like it's somewhere that's owned by the user starting the container - but you're running the app in the container as a different user, which means you run into permissions errors.

You may want to just remove -u postgres and run as root if you need to access volumes owned by your user. Running as root in a rootless container is already very secure (the container has no added privileges that your user does not), so the only security benefit to swapping to another user in the container is preventing the container from accessing files owned by your user - which, in this case, you need (to talk to the volume).

abitrolly commented 5 years ago

@mheon -u is a parameter for schemaspy, the container itself defines USER java, and podman is run unprivileged without sudo. I'd expect podman to handle mount volumes transparently without leaking low level details about uid and filesystem mappings. Otherwise all scripts will need to contain -u root, which doesn't looks very secure. )

mheon commented 5 years ago

We really can't handle this ourselves - these are separate users from the perspective of the kernel, and normal filesystem permissions apply.

abitrolly commented 5 years ago

It is possible to configure filesytem layer to ignore host permissions? If a container is already isolated through filesystem path, why impose additional uid restrictions?

Maybe it is possible to implement two layer writes? The first layer enforces permissions, so that container won't escape the defined path, but final writes to disk are ignoring the permissions. If container needs a separate user with volume mapping, maybe podman could switch to the double layer concept automatically.

rhatdan commented 5 years ago

You can change the ownership on the volume with the podman unshare chown UID PATH

abitrolly commented 5 years ago

@rhatdan PATH is path in my directory, not /var/..., right? How do I know UID? What will happen to filesystem permissions after I quit this modified user namespace?

I also checked man podman-unshare and the description sounds too low level. Maybe it is possible to modify it for people who are not familiar with cgroups yet.

podman-unshare - Run a command inside of a modified user namespace.
abitrolly commented 5 years ago

I've got a thought overnight.

(root in a rootless container is the user that started the container, all higher UIDs and GIDs are mapped to a block on the host given by /etc/subuid and /etc/subgid)

When I run container with custom USER as non-privileged, then there is no root inside anymore - is that right? If it is so, then why not to map that custom USER instead of root to my UIDs and GIDs instead?

rhatdan commented 5 years ago

By default podman as non root, runs as root within the container. This means the processes in the container have full Namespaced Capabilities. This also means that if the container process escaped the container, it would have full access to files in your homedir (Based on UID, SELinux would still block it, but I have heard that some people disable SELinux). If you run the processes within the container as a different non root UID, then those processes will run as that UID and if they escape they would only have world access to content in your homedir.

rhatdan commented 5 years ago

@abitrolly I am writing a blog based on these issues. Send me your email and I can expose an early copy to you. dwalsh@redhat.com

Just needs to be reviewed and then I can get it published.

abitrolly commented 5 years ago

@rhatdan people disable SELinux, because not all builds scripts add :Z suffix to volume mounts, without which volumes on SELinux do not work, and podman doesn't add this suffix automatically.

My email is anatoli@rainforce.org. I sent email titled "Early copy" from this address.

Given that even with podman unprivileged model escaped container can steal my private SSH keys, I don't think that podman is more secure than docker anymore. Private keys are more valuable than root level access to OS (which main risk is again - stealing private keys from more boxes). Now I think that double level filesystem access control is a must have feature for any non-priviliged process containers.

rhatdan commented 5 years ago

People running with containers not separated by SELinux are taking a big risk, since it is the main tool to protect their file system from containers.

Escape for Docker allows access to all keys, rootless podman only to the users uid. Running rootless containers in a different User Namespace would give you more protections.

rhatdan commented 5 years ago

@abitrolly But bottom line, I tell people to always run their containers as non-root, even in the rootless container. One thing we could consider would be to add a :U to volumes which would chown the directory to match the primary user of the container. For podman Might be something to consider.

thatchristoph commented 5 years ago

Not sure if this is a "true" solution or more of a workaround, but would not --userns handle at least some of the situations desired to mount with non-root user permissions?

For example:

podman run --rm --userns=keepid -v /home/hostUserName/tmp:/home/containerUserName/tmp:Z -it image_name /bin/bash

This mounts tmp inside the container at /home/containerUserName/tmp with the same UID:GID inside the container as it possesses on the host.

Perhaps --userns=ns:my_namespace could be used to mount a volume with the UID:GID corresponding to the user named my_namespace?

Note: you cannot use --user myUserName and --userns=... in the same podman run .... command, as I understand it.

github-actions[bot] commented 4 years ago

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

vrothberg commented 4 years ago

One thing we could consider would be to add a :U to volumes which would chown the directory to match the primary user of the container.

@rhatdan, what's your take on this issue. Shall we pursue your upper proposal?

rhatdan commented 4 years ago

I don't think so, I am hesitant to make this more complicated. I think it is up to the user to set up the permissions correctly on the volume.

abitrolly commented 4 years ago

I still don't understand what unshare does. How is that different from su <user>? How does unshare know the UID to run inside?

What is the proposed solution? Is the following correct?

  1. Try to figure you what will be UID of rootless container (how? ) - let's call it RLUID
  2. Run podman unshare chown -R RLUID /host/path
  3. Run container with podman run -v /host/path:/guest/path - /guest/path is now writable
  4. Exit container and run chown -R UID to get permissions back

Is that right?

ChristianCiach commented 4 years ago

Wow, this stuff is way too complicated. I've the same issue as @abitrolly (running podman as non-root, having a user inside the container that is not "root" and I cannot write to the mounted directory). I've read every comment here and I still don't have an idea how to make this work.

ChristianCiach commented 4 years ago

So, it seems like I can make it "work" be "chown"ing (on the host) the shared directory to the user-id that the non-root-container-user (in my case called "jenkins", because I'm using the jenkins:jenkins image from Docker hub) is mapped to on the host system. In my case, this jenkins-user from inside the container has the UID 559751 on the host system. (Btw, what is the easiest way to find this out?). So, doing sudo chown 559751 builds on the host makes the directory writable to the user inside the container. But this has two big issues:

I need to share this image with my coworkers (who don't have root privileges), so both of these points are unacceptable.

ChristianCiach commented 4 years ago

I don't think so, I am hesitant to make this more complicated. I think it is up to the user to set up the permissions correctly on the volume.

Okay, but how?

mheon commented 4 years ago

Root should not be necessary - podman unshare sticks you into the same user namespace that the rootless container uses, which gives you access to every UID/GID that the container does. Within a podman unshare shell you should be able to chown folders/files owned by your user to the UID/GID used by Jenkins. You will need to know what IDs are in use inside the container, because podman unshare is a shell on the host (though you can mount the container with podman mount and inspect its /etc/passwd to get those). This can also potentially allow you to identify the user we're mapped to on the host (su to the right UID in the podman unshare shell and touch a file in your /home - the UID there should be the one in use).

For the second issue... That is definitely a concern, and one I don't think we have an easy solution to as yes. There is talk of adding UID/GID mappings to LDAP for use across multiple systems, but they will still be unique to the user running the container for security reasons, so not portable between users

Running rootless containers as non-root and mounting in volumes is proving to be quite complicated. I think a review of how things are right now and a discussion of how we can improve (maybe a blog?) is definitely warranted here.

ChristianCiach commented 4 years ago

@mheon Thank you very much for your detailed comment!

Root should not be necessary

I definitely needed "sudo" to execute sudo chown 559751 builds on the host. This may be because the user accounts are centrally managed and there may be something wrong with my /etc/subuid..? I remember that it was necessary for me to create this file manually some months ago. But this may be because my workstation is old and my Fedora installation has been upgraded many times over the last six years or so.

. Within a podman unshare shell you should be able to chown folders/files owned by your user to the UID/GID used by Jenkins.

Well, this seems portable in the sense that I should be able to write a simple shellscript to automate this process for my coworkers every time they want to use my image.

unshare seems to be a strange name for this subcommand, but I probably just do not understand the deeper meaning of this. When browsing the podman-subcommands in an attempt to fix my issue I would've disregarded this subcommand immediately just because of its name. ("How does unsharing something helps me with those permission issues?")

I will tinker around with this some more tomorrow.

mheon commented 4 years ago

I agree that unshare is a terrible name; we named it after an existing utility that enters user namespaces (doing something very similar to what we do, but not doing many of the things we do to make sure that it matches what other podman commands are doing.

abitrolly commented 4 years ago

@ChristianCiach have you been able to come up with tutorial for your colleagues?

ChristianCiach commented 4 years ago

@abitrolly Sorry for the late reply. Yes, creating a simple wrapper that calls "podman unshare" before calling "podman run" works as expected. This is good enough for my use case.

zakkak commented 4 years ago

Now how do you chown the directory back to the host user though?

$ mkdir tmp 
$ podman unshare chown 1001:1001 tmp 
$ ls -la tmp 
total 0
drwxrwxr-x.  2 101000 101000   40 Jul 30 17:20 ./
drwxrwxrwt. 54 root   root   2000 Jul 30 17:20 ../
/tmp
$ chown $(id -u):$(id -g) tmp    
chown: changing ownership of 'tmp': Operation not permitted
zakkak commented 4 years ago

It might not work in all use cases but another work around is to run the command in the container with the host's user ID and GUID by using --userns=keep-id --user=$(id -ur):$(id -gr), e.g.:

$ mkdir project 
$ podman run -it --rm -v $PWD/project:/project:z --userns=keep-id --user=$(id -ur):$(id -gr) --entrypoint=/bin/bash quay.io/quarkus/ubi-quarkus-mandrel:20.1.0.1.Alpha2-java11 -c 'id; touch /project/lala'

uid=1000(1000) gid=1000 groups=1000

while without it it fails:

$ mkdir project 
$ podman run -it --rm -v $PWD/project:/project:z --entrypoint=/bin/bash quay.io/quarkus/ubi-quarkus-mandrel:20.1.0.1.Alpha2-java11 -c 'id; touch /project/lala'

uid=1001(quarkus) gid=1001(quarkus) groups=1001(quarkus)
touch: cannot touch '/project/lala': Permission denied
abitrolly commented 4 years ago

I wonder if giving the container host user ID and GUID makes the contaner unprivileged?

rhatdan commented 4 years ago

Containers by default are unprivileged. (Depending on your definition of unprivileged) Running with --keep-id just changes the way the User Namespace is setup, It does not change the security controls on the container. The only difference is instead of the users UID being Root inside of the container, the User UID is the Users UID inside of the container, and the first UID listed for the user in the /etc/subuid files user mappings is UID=0 inside of the container.

anishp55 commented 4 years ago

i'm battling this same thing. i am using a bitnami image of postgresql from docker hub. it has a baked in user id of 1001. on my arch linux system my uid is 1000. I would like to make a directory in my home directory for postgres to persist its data to, and be able to poke around in that directory without having to chown it all the time when i want to. @rhatdan what is your suggestion for people using vendor provided images that already have a uid baked in?

rhatdan commented 4 years ago

Well for now you can do

$ podman unshare chown 1001:1001 PATHTODIR

We could add something to the volume command to do this, but I am not sure how ugly the syntax would be.

kushaldas commented 3 years ago

Well for now you can do

$ podman unshare chown 1001:1001 PATHTODIR

We could add something to the volume command to do this, but I am not sure how ugly the syntax would be.

I would love to have this feature, even if it looks ugly :)

danielwsmithee commented 3 years ago

Thanks for writing this up, it really helped me to understand what is going on here.

Unfortunately the container I am using and want to deploy and regularly update has 43 different user, and their associated group relationships. So If I understand the situation I would need to parse out all 43 entries from /etc/passwd using podman mount, then create a wrapper script that calls podman unshare with each of those. Then when the container gets updated to add a new user, I'm then broken and need to go update the script.

I know it would be complex from an implementation perspective, but it would be great if podman could inspect /etc/passwd itself from within the image and pull out the appropriate non-root users all within the --volume command option without a need for further user options.

rhatdan commented 3 years ago

Why not mount /home into the container then?

abitrolly commented 3 years ago

I would appreciate if podman could just use network for mounting local volumes without all the complexity of multilevel filesystem->OS->SELInux->container->OS->filesystem permissions.

dzintars commented 3 years ago

How hard it could be to write a nice blog post series for the simple users which follows the Podman mantra of not running root-full containers, but wants just to mount some volumes. For the most common use-cases. Like: 1) I, as a simple user want to run rootless PHP container and to keep working on my PHP code 2) I, as a simple user want to share volume between separate Nginx and PHP containers and want to keep working on the codebase. 3) I, as a simple user want to spin up some MariaDB instance with database being persisted on filesystem. 4) Etc, etc...

IMO there is not so much different flavors of the setups which are used by the mainstream users. Not everyone works on Go/Rust single binaries which most often does not require volumes. Not everybody is working in the CI/CD environment which constantly does full baking. Not everyone wants to run podman build/run on every single line added to the code.

This volumes question is like Top 1 asked question and yet, in Podman (which "coined" the root-less idioms) website can't be found any article for "simple users". Every article i saw is like - "Guys, in order to run your quick container idea in rootless, please first learn whole SELinux labeling and type enforcement, then learn Linux namespacing and then feel free to read this article and to run your container in rootless."

I understand that this is fairly complex topic. And the tooling is tailored to match pretty low level requirements and thus it is kinda flexible and complex at the same time. So... there is no need for high level user friendly API implementations which could take ages to implement. All it takes, just to write canonical blog series about the most common setups. I am not expert, to write 100% accurate articles, but there are people who are. And those people are wasting their valuable time in answering the individual issues in the GitHub instead of writing single canonical user reference which could be updated as API changes.

pohlt commented 2 years ago

Thanks @rhatdan for the excellent article series where you explain things like podman unshare (https://www.redhat.com/sysadmin/rootless-podman-makes-sense).

This works fine for a single container trying to access a host folder, but how about two containers which potentially use two different users both trying to access the same host folder (e.g. one writing, the other reading files)?

rhatdan commented 2 years ago

Well they should be in the same group or one have root ownership and the other group read access. Using --group-add keep-groups.

tjb36 commented 1 year ago

I am having a similar problem that is described here, but the solutions proposed do not work for my use case, because I am running two rootless containers in a pod. I hope I am not doing something silly, but this is the minimum example to reproduce the problem I see:

Note: Both containers are run by non-root user tom on the host (UID=1005). Container 1 (influxdb) runs its process in the container by default as root(UID=0). Container 2 (telegraf) runs its process in the container by default as non-root user telegraf (UID=999).

1) Create a named volume: podman volume create influxdb_volume 2) Create a new pod: podman pod create --name monitoring_pod --publish 8086:8086 3) Create a container in the pod and mount in the named volume:

podman run -d --rm \
--name influxdb_container \
--pod monitoring_pod \
--mount type=volume,source=influxdb_volume,destination=/var/lib/influxdb \
influxdb:1.8

4) On the host, I can see that a directory in the volume is owned by the non-root host user:

ls -ltr /home/tom/.local/share/containers/storage/volumes/influxdb_volume/_data/data/
drwx------ 4 tom tom 36 Sep 13 12:14 _internal

5) Create the second container, and bind mount in one of the volume's directories:

podman run -d --rm \
--name telegraf_container \
--pod monitoring_pod \
--mount type=bind,src=/home/tom/telegraf.conf,dst=/etc/telegraf/telegraf.conf \
--mount type=bind,src=/home/tom/.local/share/containers/storage/volumes/influxdb_volume/_data/data,dst=/home/influxdb_data \
telegraf

6) I can now see that the bind-mounted directory inside the container is owned by root (because the owner non-root host user is mapped to root inside the container):

podman exec --user telegraf telegraf_container ls -ltr /home/influxdb_data
drwx------ 4 root root 36 Sep 13 11:14 _internal 

7) Trying to access the directory from inside the container fails, because container runs as non-root user (telegraf, UID=999), and directory owned by container root:

podman exec --user telegraf telegraf_container du -s /home/influxdb_data
du: cannot read directory '/home/influxdb_data/_internal': Permission denied

What would be a good way to solve this?

I cannot use unshareto change the owner on the host, because this would mess up the influxdb container (who performs some actions as a different user influxdb sometimes, not always root).

I also cannot use --userns=keep-id at the container level (it gives an error "--userns and --pod cannot be set together"), and setting it at the pod level messes up the behaviour of the first container.

rhatdan commented 1 year ago

Does --userns=keep-id when creating the pod work?

podman pod create --userns=keep-id --name monitoring_pod --publish 8086:8086

tjb36 commented 1 year ago

@rhatdan No unfortunately not. If I pass the keep-id flag into the pod creation at step 2 like you suggested, then in step 3 the container fails to initialize, giving the following error:

run: open server: open tsdb store: mkdir /var/lib/influxdb/data/telegraf/_series: permission denied

Like I said before, setting it at the pod level messes up the behaviour of the first container.

rhatdan commented 1 year ago

Please open a new issue for this and we can get others to comment, we are trying to prevent people from discussing issues on older issues.