Closed ubergeek77 closed 2 years ago
Looks like the cap is set to me.
podman run --cap-add SYS_ADMIN fedora capsh --print | grep sys_admin
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_sys_chroot,cap_sys_admin,cap_setfcap=eip
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_sys_chroot,cap_sys_admin,cap_setfcap
Current IAB: cap_chown,cap_dac_override,!cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,!cap_linux_immutable,cap_net_bind_service,!cap_net_broadcast,!cap_net_admin,!cap_net_raw,!cap_ipc_lock,!cap_ipc_owner,!cap_sys_module,!cap_sys_rawio,cap_sys_chroot,!cap_sys_ptrace,!cap_sys_pacct,cap_sys_admin,!cap_sys_boot,!cap_sys_nice,!cap_sys_resource,!cap_sys_time,!cap_sys_tty_config,!cap_mknod,!cap_lease,!cap_audit_write,!cap_audit_control,cap_setfcap,!cap_mac_override,!cap_mac_admin,!cap_syslog,!cap_wake_alarm,!cap_block_suspend,!cap_audit_read,!cap_perfmon,!cap_bpf,!cap_checkpoint_restore
I have a feeling masking or some other security feature like SELinux is blocking the access?
--cap-add SYS_ADMIN
was never an issue. That works fine, see step 3 of my issue report.
Can you try your test command again using --privileged
? That's where I'm having problems.
@rhatdan You don't have a --user
in there.
I recall this being related to Docker compat - Docker does not grant certain capabilities to containers when a non-root user is set, even if the container is privileged. I'm on vacation so I can't chase down specific bugs related to this, but I'm 90% sure this was changed to make our behavior closer to Docker's (and because defaulting to giving less caps is generally more secure, which is itself a strong argument).
You are right, we don't give all caps to the default user. Just add them to the bounding set.
$ podman run --privileged --user 1 fedora capsh --print
Current: =
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
Ambient set =
Current IAB:
Securebits: 00/0x0/1'b0 (no-new-privs=0)
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=1(bin) euid=1(bin)
gid=1(bin)
groups=
Guessed mode: UNCERTAIN (0)
# docker run --privileged --user 1 fedora capsh --print
Unable to find image 'fedora:latest' locally
latest: Pulling from library/fedora
edad61c68e67: Pull complete
Digest: sha256:40ba585f0e25c096a08c30ab2f70ef3820b8ea5a4bdd16da0edbfc0a6952fa57
Status: Downloaded newer image for fedora:latest
Current: =i cap_perfmon,cap_bpf,cap_checkpoint_restore-i
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Ambient set =
Current IAB: cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,!cap_perfmon,!cap_bpf,!cap_checkpoint_restore
Securebits: 00/0x0/1'b0 (no-new-privs=0)
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=1(bin) euid=1(bin)
gid=1(bin)
groups=
Guessed mode: UNCERTAIN (0)
Docker is slightly different but the user definitely does not get CAP_SYS_ADMIN.
If running with cap-add, Docker and Podman also differ.
# docker run --cap-add SYS_ADMIN --user 1 fedora capsh --print | grep Current:
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap=i
$ podman run --cap-add SYS_ADMIN --user 1 fedora capsh --print | grep Current:
Current: cap_sys_admin=eip
From what I can see Docker doesn't have any special handling for uid != 0 and --privileged
. I've looked into the generated OCI configuration file and they configure all the capabilities but they do not set Ambient
capabilities, so on the exec into the container all the caps are effectively lost.
It seems that a lot of code in our capabilities handling is just there to emulate the issue in Docker. We should do the right thing and set all the caps without any special handling for uid != 0
.
I suggest we do something like:
diff --git a/pkg/specgen/generate/security.go b/pkg/specgen/generate/security.go
index 988c29832..c643fde92 100644
--- a/pkg/specgen/generate/security.go
+++ b/pkg/specgen/generate/security.go
@@ -124,7 +124,7 @@ func securityConfigureGenerator(s *specgen.SpecGenerator, g *generate.Generator,
capsRequiredRequested = strings.Split(val, ",")
}
}
- if !s.Privileged && len(capsRequiredRequested) > 0 {
+ if len(capsRequiredRequested) > 0 {
// Pass capRequiredRequested in CapAdd field to normalize capabilities names
capsRequired, err := capabilities.MergeCapabilities(nil, capsRequiredRequested, nil)
if err != nil {
@@ -158,9 +158,14 @@ func securityConfigureGenerator(s *specgen.SpecGenerator, g *generate.Generator,
configSpec.Process.Capabilities.Effective = caplist
configSpec.Process.Capabilities.Permitted = caplist
} else {
- mergedCaps, err := capabilities.MergeCapabilities(nil, s.CapAdd, nil)
+ var startingCaps []string
+ if s.Privileged {
+ startingCaps = caplist
+ }
+
+ mergedCaps, err := capabilities.MergeCapabilities(startingCaps, s.CapAdd, s.CapDrop)
if err != nil {
- return errors.Wrapf(err, "capabilities requested by user are not valid: %q", strings.Join(s.CapAdd, ","))
+ return err
}
boundingSet, err := capabilities.BoundingSet()
if err != nil {
I believe the first time this came up it was considered a CVE because we were granting "excess capabilities" - so I'm not opposed, but we should be cautious given potential security implications here.
Will try and find the CVE once I'm done reviewing issues and PRs
that would be done only with --privileged
. Isn't that the expectation when you use that flag?
Evidently, it is not. I found the CVE:
https://access.redhat.com/security/cve/CVE-2021-20188
It's not specifically SYS_ADMIN (I believe it's DAC_OVERRIDE) but it's definitely a too-many-caps issue.
Maybe it is too late to fix it, but I disagree with that CVE and the analysis. The current behavior is quite confusing as it looks like there is a separation between the "container capabilities" and the "PID 1 capabilities". To me, they are the same thing.
The command line I specify, IMO, should apply to the process that is launched. Instead, it seems that --privileged
affects future exec sessions when they are running as root.
I think we just depend on a buggy behavior from Docker, since --privileged
sets all the capabilities, but they forget to set Ambient
capabilities, so the final result is that they are lost once the kernel execv the container process.
I would prefer we figure a way for users to specify it perhaps in containers.conf. I like the separation between root user having the caps and rootless requiring a setuid app to get it.
so is --privileged
used only to enable all capabilities in the bounding set? If it is to affect exec sessions, there is always the possibility to specify --privileged
for the exec itself:
$ podman run --rm -d --user 100 fedora sleep 100
b063e53126adcfc33dad5d3a1dd88c9a69a3f6e44d48c91d65848b1567193590
$ podman exec -l --user 0 --privileged grep ^Cap /proc/self/status
CapInh: 000001ffffffffff
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 000001ffffffffff
The bounding set should also be usable by setuid binaries, right?
So with --privileged
and non-0 uid the main process can't use the capabilities, but any setuid helpers it calls can. This allows restricting how the process can use the capabilities. Basically a --privileged
container behaves exactly as the host system in this regard, which I think is a useful use-case.
Thanks for the feedback. Since it is working as expected, I am closing the issue.
Sorry, I'm not sure I see how this is expected behavior?
How am I supposed to give an arbitrary process, and any process it calls/forks, any permissions they need using only --privileged
?
you can add them individually with --cap-add
. Unfortunately there is a check in podman now that prevents it, but I've opened a PR to make it possible:
https://github.com/containers/podman/pull/13744
$ bin/podman run --user 100 --cap-add=DAC_OVERRIDE --privileged --rm fedora grep ^Cap /proc/self/status
CapInh: 0000000000000002
CapPrm: 0000000000000002
CapEff: 0000000000000002
CapBnd: 000001ffffffffff
CapAmb: 0000000000000002
$ bin/podman run --user 100 --cap-add=ALL --privileged --rm fedora grep ^Cap /proc/self/status
CapInh: 000001ffffffffff
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 000001ffffffffff
Perfect! Thanks for making that PR. That will solve the main problem that led me to make this issue in the first place.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
In rootless mode,
--privileged
does not grantSYS_ADMIN
to the container unless the container is running as UID 0.Steps to reproduce the issue:
Containerfile
andpodman build -t issue-demo .
Use
--privileged
and try to mounttest.img
as a non-root user with FUSE (which requiresSYS_ADMIN
). Observe that this fails:Use
--cap-add SYS_ADMIN
and try to mounttest.img
as a non-root user with FUSE. Observe that this is successful:Use
--privileged
once more, but specify-u 0
to run the container as "root". Try to mount the image; observe that this is successful:Describe the results you received: Using the
--privileged
flag does not grantSYS_ADMIN
to non-root container users. It only grantsSYS_ADMIN
to UID 0.Using
--cap-add SYS_ADMIN
properly grantsSYS_ADMIN
to any container user, regardless of UID.Describe the results you expected: I expected the
--privileged
flag to grantSYS_ADMIN
to all container users, regardless of UID.Additional information you deem important (e.g. issue happens only occasionally): I am running podman in rootless mode. Unfortunately I am not equipped to test this in root mode. This behavior I described also happens with
--userns=keep-id
.Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)
Yes - this is the latest version available to my distribution. I have checked the troubleshooting guide, and a maintainer commented in another issue suggesting I file this issue.
Additional environment details (AWS, VirtualBox, physical, etc.):
Physical headless server