bus1 / dbus-broker

Linux D-Bus Message Broker
https://github.com/bus1/dbus-broker/wiki
Apache License 2.0
675 stars 78 forks source link

Support old kernel without PR_CAP_AMBIENT #255

Closed ntkme closed 3 years ago

ntkme commented 3 years ago

In summary, this PR in combination with https://github.com/stevegrubb/libcap-ng/pull/29, would make dbus-broker work with <4.3 kernel.

I have a device with old 4.1 kernel that I cannot upgrade, and I was trying to run dbus_broker with --audit in fedora:33 container.

At the beginning the error looks like this:

ERROR util_audit_drop_permissions @ ../src/util/audit.c +81: Operation not permitted

I traced it down to libcap-ng and patched it to support older kernel: https://github.com/stevegrubb/libcap-ng/pull/29, then I get this:

ERROR util_audit_drop_permissions @ ../src/util/audit.c +86: Invalid argument

It turns out to be a single call to prctl(PR_CAP_AMBIENT, ...) in dbus-broker, which would fail with EINVAL for the same reason as libcap-ng failed previously.

This PR adds a special case to ignore EINVAL which happens on old kernels. In combination with other PR I made for libcap-ng, I was able to get dbus-broker to work on fc33 with 4.1.x kernel.

[root@UDM-Pro build]# uname -r
4.1.37-v1.8.6.2969-6a09c7c

[root@UDM-Pro build]# systemctl status dbus.service dbus.socket
● dbus-broker.service - D-Bus System Message Bus
     Loaded: loaded (/usr/lib/systemd/system/dbus-broker.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2021-02-03 05:37:36 UTC; 13s ago
TriggeredBy: ● dbus.socket
       Docs: man:dbus-broker-launch(1)
   Main PID: 22346 (dbus-broker-lau)
     Memory: 1.3M
     CGroup: /libpod_parent/libpod-8adfccb893792578fbf459b928068b26207ed78081a08ca9ca8cd54207befe95/system.slice/dbus-broker.service
             ├─22346 /usr/local/bin/dbus-broker-launch --scope system --audit
             └─22347 dbus-broker --log 4 --controller 9 --machine-id 5d7a0b05919e45828224a9d15d5604cc --max-bytes 536870912 --max-fds 4096 --max-match>

Feb 03 05:37:35 UDM-Pro systemd[1]: Starting D-Bus System Message Bus...
Feb 03 05:37:36 UDM-Pro systemd[1]: Started D-Bus System Message Bus.
Feb 03 05:37:36 UDM-Pro dbus-broker[22347]: Falling back to racy auxiliary groupsresolution using nss.
Feb 03 05:37:36 UDM-Pro dbus-broker-lau[22346]: Ready

● dbus.socket - D-Bus System Message Bus Socket
     Loaded: loaded (/usr/lib/systemd/system/dbus.socket; disabled; vendor preset: enabled)
     Active: active (running) since Wed 2021-02-03 05:37:23 UTC; 26s ago
   Triggers: ● dbus-broker.service
     Listen: /run/dbus/system_bus_socket (Stream)
     CGroup: /libpod_parent/libpod-8adfccb893792578fbf459b928068b26207ed78081a08ca9ca8cd54207befe95/system.slice/dbus.socket

Feb 03 05:37:23 UDM-Pro systemd[1]: Listening on D-Bus System Message Bus Socket.
dvdhrm commented 3 years ago

Can you elaborate why you cannot upgrade your kernel but you can upgrade dbus-broker? What is the reason you need to support a kernel version that is >5 years old?

Also, wouldn't this mean we lack CAP_AUDIT_WRITE once we change UID? Without ambient-caps set, they are cleared when we drop privileges, right? (Or was that only during execve? I don't remember the exact behavior of the old kernels).

ntkme commented 3 years ago

It's a network device made by Ubiquiti called udm-pro. It's has a locked bootloader and I cannot change kernel because whole / is mounted as ram-overlay. Any change I make in / will be lost up on reboot. The software on it is well maintained and the device is still being sold as new today, but unfortunately, the company seems to have no plan to upgrade its kernel anytime soon. The kernel version is still 4.1.37 with latest firmware released just a few days ago to patch the dnsmasq vulnerability.

This device has podman installed. The containers are persisted in a separately mounted data partition. My goal is to run a fedora based systemd container, and have nested podman containers inside. I know it's a old kernel, so I already tried old versions of fedora. Unfortunately, old version of systemd and podman / runc / crun just have too much problems running inside a container, which are much harder to deal with.

Now move to fc33, nested podman containers worked and everything in systemd worked expect for dbus, and that's try I'm trying to fix it.


I'm not an expert in capability, but it appears to be retained. I tried to call r = capng_have_capability(CAPNG_PERMITTED, CAP_AUDIT_WRITE); after capng_change_id, and I got r = 1.

diff --git a/src/util/audit.c b/src/util/audit.c
index 1edbb28..3a5844c 100644
--- a/src/util/audit.c
+++ b/src/util/audit.c
@@ -80,9 +80,19 @@ int util_audit_drop_permissions(uint32_t uid, uint32_t gid) {
                 if (r)
                         return error_origin(-EPERM);

+                r = capng_have_capability(CAPNG_PERMITTED, CAP_AUDIT_WRITE);
+                if (r == CAPNG_FAIL)
+                        return error_origin(-EIO);
+                else if (r == 1)
+                        have_audit_write = true;
+                else
+                        have_audit_write = false;
+
+                fprintf(stderr, "still have audit write? %i\n", have_audit_write);
+
                 if (have_audit_write) {
                         r = prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, CAP_AUDIT_WRITE, 0, 0);
-                        if (r < 0)
+                        if (r < 0 && errno != EINVAL)
                                 return error_origin(-errno);
                 }
         }
Feb 03 17:14:26 UDM-Pro systemd[1]: Starting D-Bus System Message Bus...
Feb 03 17:14:26 UDM-Pro systemd[1]: Started D-Bus System Message Bus.
Feb 03 17:14:26 UDM-Pro dbus-broker-launch[5671]: still have audit write? 1
Feb 03 17:14:26 UDM-Pro dbus-broker[5671]: Falling back to racy auxiliary groupsresolution using nss.
Feb 03 17:14:26 UDM-Pro dbus-broker-lau[5670]: Ready
dvdhrm commented 3 years ago

It's a network device made by Ubiquiti called udm-pro. It's has a locked bootloader and I cannot change kernel because whole / is mounted as ram-overlay. Any change I make in / will be lost up on reboot. The software on it is well maintained and the device is still being sold as new today, but unfortunately, the company seems to have no plan to upgrade its kernel anytime soon. The kernel version is still 4.1.37 with latest firmware released just a few days ago to patch the dnsmasq vulnerability.

I admire the effort to make those old devices work. But we cannot keep supporting all combinations of old and new software. We are quite explicit on that, and dbus-broker relies on a lot of modern linux kernel features (even though the dependencies are not so clear). We keep submitting bugfixes to the linux kernel, to make sure dbus-broker runs fine. Without all those fixes, you will likely run into other issues.

The reason we don't want to keep compatibility to old kernel releases is to reduce the number of combinations that can happen on new machines. We don't want to accidentally have one of those fallback-paths taken on modern machines, because it would make it really hard to argue about what is going on. Hence, we have hard-dependencies which we are rather reluctant to loosen.

This device has podman installed. The containers are persisted in a separately mounted data partition. My goal is to run a fedora based systemd container, and have nested podman containers inside. I know it's a old kernel, so I already tried old versions of fedora. Unfortunately, old version of systemd and podman / runc / crun just have too much problems running inside a container, which are much harder to deal with.

Now move to fc33, nested podman containers worked and everything in systemd worked expect for dbus, and that's try I'm trying to fix it.

Did you try installing dbus-daemon on fc33, and then using dbus-daemon instead of dbus-broker? You can simply uninstall dbus-broker in that case, and it should automatically enable dbus-daemon. If not, just run systemctl enable dbus-daemon and systemctl --global enable dbus-daemon.

I'm not an expert in capability, but it appears to be retained. I tried to call r = capng_have_capability(CAPNG_PERMITTED, CAP_AUDIT_WRITE); after capng_change_id, and I got r = 1.

diff --git a/src/util/audit.c b/src/util/audit.c
index 1edbb28..3a5844c 100644
--- a/src/util/audit.c
+++ b/src/util/audit.c
@@ -80,9 +80,19 @@ int util_audit_drop_permissions(uint32_t uid, uint32_t gid) {
                 if (r)
                         return error_origin(-EPERM);

+                r = capng_have_capability(CAPNG_PERMITTED, CAP_AUDIT_WRITE);
+                if (r == CAPNG_FAIL)
+                        return error_origin(-EIO);
+                else if (r == 1)
+                        have_audit_write = true;
+                else
+                        have_audit_write = false;
+
+                fprintf(stderr, "still have audit write? %i\n", have_audit_write);
+
                 if (have_audit_write) {
                         r = prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, CAP_AUDIT_WRITE, 0, 0);
-                        if (r < 0)
+                        if (r < 0 && errno != EINVAL)
                                 return error_origin(-errno);
                 }
         }
Feb 03 17:14:26 UDM-Pro systemd[1]: Starting D-Bus System Message Bus...
Feb 03 17:14:26 UDM-Pro systemd[1]: Started D-Bus System Message Bus.
Feb 03 17:14:26 UDM-Pro dbus-broker-launch[5671]: still have audit write? 1
Feb 03 17:14:26 UDM-Pro dbus-broker[5671]: Falling back to racy auxiliary groupsresolution using nss.
Feb 03 17:14:26 UDM-Pro dbus-broker-lau[5670]: Ready

Can you check what happens after exec? That is, the dbus-broker process (rather than the launcher) would very likely loose the capability.

ntkme commented 3 years ago

You're right, there is no audit write after exec. However,

https://github.com/bus1/dbus-broker/blob/83ca4ee8c2d27ade0a0d733b5bcfd8c6944adeb5/src/util/audit.c#L139-L150

It looks like the existing logic is silently treat no audit write as unsupported. So, this patch prevents a hard crash but its behavior may be up for a debate.

I have build a patched version and bundled it in my container so it solved my own issue. Please feel free to close this if the behavior of silently disable audit is not appropriate.

ntkme commented 3 years ago

By the way, dbus-daemon also has a similar problem as it also links with libcap-ng.so.0, the difference being only libcap-ng need to be patched in dbus-daemon's case.

dvdhrm commented 3 years ago

It looks like the existing logic is silently treat no audit write as unsupported. So, this patch prevents a hard crash but its behavior may be up for a debate.

I have build a patched version and bundled it in my container so it solved my own issue. Please feel free to close this if the behavior of silently disable audit is not appropriate.

Good point. To be honest, I would prefer changing the behavior to treat missing capabilities as a hard-failure, rather than the current soft-failure.

Btw., you can try dropping the --audit argument to dbus-broker, by replacing the dbus-broker.service unit in /etc. This should disable audit-reports.

I will go ahead and close this as it is not something I like to support and keep testing in the future. Sorry!

ntkme commented 3 years ago

Btw., you can try dropping the --audit argument to dbus-broker, by replacing the dbus-broker.service unit in /etc. This should disable audit-reports.

Unfortunately, it will still crash. Because util_audit_drop_permissions() is always invoked, it will crash at exactly where this PR was trying to patch.