Open quentin9696 opened 2 years ago
Because those groupids are not mapped inside of the containers user namespace.
If you run podman top CID hgroups
you will see the leaked GIDs into the container.
User Namespace maps all UIDs not mapped into the User Namespace as 65534(nobody)
Hi @rhatdan
I'm a bit confused. When I run podman top CID hgroups
, I got
HGROUPS
558749,558749,2001
Do I need to create group1, group2 inside the container?
Thanks
No, did you run your container with --groups keep-groups
$ podman run -it --rm --userns=keep-id --annotation run.oci.keep_original_groups=1 docker.io/library/bash
bash-5.1$ id
uid=2001(test) gid=2001(test) groups=65534(nobody),65534(nobody),2001(test)
Now in a different terminal run podman top CID hgroups
And it should show all 5 groups.
Yes, that's what I did:
$ podman run -it --rm --userns=keep-id --annotation run.oci.keep_original_groups=1 docker.io/library/bash
bash-5.1$ id
uid=2001(test) gid=2001(xxxxxxx) groups=65534(nobody),65534(nobody),2001(test)
$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
070992ab2903 docker.io/library/bash:latest bash 5 seconds ago Up 5 seconds ago quirky_nas
$ podman top 070992ab2903 hgroups
HGROUPS
427677,427677,2001
That looks correct, although podman top might have a bug here, since it printed out the first leaked group twice. @vrothberg PTAL, it looks like we might have a bug in podman top.
@giuseppe PTAL I am not sure we are leaking groups in podman 4.0
$ podman -v
podman version 4.0.0-dev
$ groups
dwalsh wheel users
$ podman run -d --group-add keep-groups alpine top
16fe1fbbdd9ebc0c49760b54c62ef81e5ad480e694492d05223e6f43ccb84a34
$ podman top -l hgroups
HGROUPS
165533,165533,3267
$ podman top -l groups
GROUPS
nobody,nobody,root
it seems to work for me.
What groups do you have on the host?
Can you check grep ^Groups /proc/$CONTAINER_PID/status
?
Just to make sure to understand well what's happen.
If I run podman in rootless, and add the --group-add keep-groups
flag, I should have the same groups on the container and host. In my case, I should see my 2 other groups ids ?
The Linux kernel maps gids that are not part of the user namespace mapping to the overflow gid.
Yes, an example would be:
~ $ groups; podman unshare groups
vrothberg wheel
root nobody
What can be solutions to be able to also map gid that are not part of the user namespace ?
In my case:
grep ^Groups /proc/2410/status
Groups: 1001 2000 2001
podman top
requires to be inside podman's user NS in order to join the container's PID NS.
So I think we had to find a way to "leak" the host process' groups (e.g., export HOSTS_GROUPS=$(groups)
) into podman's user namespace. @giuseppe WDYT?
Is the HOSTS_GROUPS available inside of the container, or just to podman top?
Is the HOSTS_GROUPS available inside of the container, or just to podman top?
It does not exist yet but I would leak it before re-execing into Podman's User NS. groups(1)
would not be sufficient though since we'd need the ID and the name. I don't think we should leak it into the container for security reasons; any info about the host could theoretically be exploited.
Right, I thing you could set this in the user namespace by default then top
could find it, I think the GIDs are all you need, since the user namespace still has access to the /etc/group on the host.
$ grep Group /proc/self/status
Groups: 10 100 3267
$ podman unshare grep Group /proc/self/status
Groups: 65534 65534 0
@giuseppe should we leak this always to the user namespace or only when running top, we could force this to happen in rootless.c?
would that work though?
We are injecting the groups of the current process, but we should read the /proc/$CONTAINER_PID/status
file instead since in theory, they could be different (user added to a new group and runs newgrp
).
Yes, this kind of sucks, Is there away to first look at the process out side of the user namespace and then enter the user namespace to continue into the pid namespace?
Yes, this kind of sucks, Is there away to first look at the process out side of the user namespace and then enter the user namespace to continue into the pid namespace?
I am still looking into it, if we can leak /proc
somehow, but the IDs are always converted depending on the reader:
$ podman run --rm -v /proc:/proc-host --uidmap 0:1000:10000 alpine grep ^[UG]id /proc-host/1/status
The only way so far seems to do it in two steps, do not join directly the user namespace and read this information from the host, then re-exec a helper process to read everything else.
It looks like a corner case though, is it even worth to support in podman top
? Could we just mark these IDs so that it is clear they are injected from the host?
It looks like a corner case though, is it even worth to support in podman top?
I agree. It looks like a substantial massaging of the code for a corner case.
Could we just mark these IDs so that it is clear they are injected from the host?
Can you elaborate on what you mean by "marking"?
Just convert the overflow id to something clearer like "Not Mapped" or something people can understand more easily
Well that is the issue, everyone who has hit this errors is already complaining about seeing the
$ podman run --group-add=keep-groups alpine groups root nobody nobody
Couldn't we just leak in a list of groups via environment variable on podman top, and then substiture the nobody for IDs on the list other then the primary group. If there are no matches for nobody group then we just drop thinking that there is no leak.
A friendly reminder that this issue had no activity for 30 days.
@vrothberg @giuseppe Lets talk about this at Watercooler tomorrow.
A friendly reminder that this issue had no activity for 30 days.
A friendly reminder that this issue had no activity for 30 days.
Couldn't we just leak in a list of groups via environment variable on podman top, and then substiture the nobody for IDs on the list other then the primary group. If there are no matches for nobody group then we just drop thinking that there is no leak.
@rhatdan how would that env variable look like? Wouldn't we need to inject the entire mapping? That would make me nervous for security reasons.
it could be the output of grep ^Groups /proc/self/status
.
The problem I see is that this information may be different than what the container process is using. It is rarely changed, but if it happens then it is going to be difficult to find out what happened and why podman top
returns the wrong information
Well Podman top returns the wrong information now.
The issue is we can not get the actual GIDs of the leaked FDs, If we just leaked the FDs in as the Current list and we found a matching list of NOBODYS we would be 99% sure that they are the leaked FDs.
Actually I think we would need to record the grep ^Groups /proc/self/status. into the container info, so we could record these were leaked. Then podman top could look this information up, when it sees multiple NOBODY groups in the /etc/group.
A friendly reminder that this issue had no activity for 30 days.
Apparently what I want is called rootless id mounts and it is not supported yet in the kernel due to security concerns in the design.
My "solution" here is a proposal for (1) a permission system for rootless id mounts and (2) an idea of not only mapping "container uids to high uids at the host" (/etc/subgid
) but also the opposite, mapping "low uids at the host to high uids inside the container". With both the permission system and the gid inversion (low->high & high-> low) rootless mapping of secondary groups should not be a problem.
However I guess the following applies:
If it was that easy it would have been done already.
Thanks anyway for reading. And apologies for probably wasting your time, I'm learning.
When using rootless containers, for instance with podman, podman creates a user namespace following settings defined at /etc/subuid and /etc/subgid.
These settings allow to map users and groups in the user namespace (inside the container) to a reserved range (if done correctly the range is unique for each user) in the host/parent namespace.
This correspondence is used so we can create files with different user/group ownership inside the namespace that do not collide with any other user in the host namespace. Specifically the 0 UID and the 0 GID in the userspace are mapped to the default user id and the default group id, so it's easy for the user namespace processes to know how to make files owned by the parent user: just assign them to root inside the user namespace.
I do not know of an easy way to configure the opposite: I would like to map groups in the host to a reserved range inside the namespace (you have called this "group leaking"). For instance if I have an "engineering" group in my host system, e.g. with gid 1000, as system administrator, I would like for the default user namespaces in rootless podman to see mounted host files belonging to the "engineering" group (and ideally not other random files in the container) as belonging to the "engineering" group inside the user namespace as well.
I believe it would make sense to have a /etc/revsubgid
file specifying a list of groups that should leak into the user namespaces by default.
This list could be given in the following format:
<gid_host>:<uids_filter>:<gids_filter>
The first field is the group name or group id in the host that should be leaked into the namespace.
The second and third fields if empty it should mean "everyone". They are a comma separated list of user names or uids, and a comma separated list of group names or group IDs respectively. When creating a user namespace for a given user, only if the user is in those users or groups it would leak the gid_host
.
For instance:
engineering::engineering
Would automatically map, for all users in the engineering group (as given by the last field) the engineering group (first field).
This would be convenient for rootless containers that are expected to access directories mounted as volumes owned by secondary groups.
podman (via crun) can now use --groups-add keep-groups
to preserve group access. However (correct me otherwise) I understand the kernel maps those groups to overflow IDs. Seeing all those nobody
is unintuitive to me.
Besides leaking the groups in the namespace, podman could additionally append the leaked groups into the container /etc/groups
file, and modify the /etc/passwd
file in the container adding the root user to the leaked groups, so the root user in the container would have transparent access to the leaked groups and the group names would appear with the same name as the host.
If that's already doable with some setting and I have missed it, I apologize.
I would appreciate your feedback. I am not sure if I can contribute to this, since this is far from my field of knowledge, but for sure I'd love to use this feature.
Thank you for your time reading this and your work in podman.
Adding my +1
here as an upstream glibc developer.
Developers are using distrobox
and toolbox
to develop glibc and one of the limitations they run into is that the glibc testsuite users secondary groups for testing the POSIX identity management APIs. Often we require just one additional supplementary group, and we need to be able to validly find the group via getgrouplist
and then use fchown
.
Having a straight forward way to map at least some host groups into the container would be useful.
We've worked around this today and mark a subset of tests as unsupported in the container configurations that lack the requisite configurations. This isn't new, there are some tests we can't run in containers at all (like tests which use namespace isolation themselves to test things).
Developers are using
distrobox
andtoolbox
to develop glibc and one of the limitations they run into is that the glibc testsuite users secondary groups for testing the POSIX identity management APIs. Often we require just one additional supplementary group, and we need to be able to validly find the group viagetgrouplist
and then usefchown
.
this won't work even if we solve the issue above. A group will show as overflow id
inside the user namespace, the kernel controls that and we have no way to change this behavior. I think that for your use case, there is need to have a correct mapping for the groups, in a way that setgroups
work fine inside the container without the keep-id
workaround.
For a rootless user, you need to make sure these additional GIDs are added through /etc/subgid
and then run podman system migrate
to recreate the user namespace
It would be great if user groups could be added to the new user namespace via newgidmap, but I guess the risk DAC_OVERRIDE, might allow users to modify group files.
/kind bug
Description
User group mapping are not keep when using
--annotation run.oci.keep_original_groups=1
On the host:
When I run the container:
I'm not sure to understand why my group1 and group2 are mapped with
nobody
.Steps to reproduce the issue:
Create a user
Create 2 groups and add it to the user
run a container with userns keep-id and with annotation
run.oci.keep_original_groups=1
and check what are your groups. They should be mapped as your hostDescribe the results you received:
Describe the results you expected:
Additional information you deem important (e.g. issue happens only occasionally):
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):I use fedora coreOS aws AMI
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
Run on AWS fedora coreos official image