archzfs / archzfs-ci

Automated testing and deployment for archzfs using buildbot
http://ci.archzfs.com
GNU General Public License v2.0
5 stars 4 forks source link

Workaround for `Failed to open system bus: No such file or directory` #14

Closed UweSauter closed 10 months ago

UweSauter commented 11 months ago

Again for archzfs/archzfs/issues/521.

As described in this Arch Linux Bug Tracker Thread one workaround is to bind-mount /run/dbus/system_bus_socket into the container. In order to do that the worker definition inside docker-compose.yml needs to be extended to include

        volumes:
            - /run/dbus/system_bus_socket:/run/dbus/system_bus_socket

This gets clear of the Failed to open system bus: No such file or directory message but brings up a new one:

cd "/worker/all/build/packages/_utils/zfs-utils" && ccm64 s 
Output: 
----> Attempting to build package...
==> Synchronizing chroot copy [/scratch/.buildroot/root] -> [buildbot]...done
Failed to create /../../devtools.slice/devtools-buildbot.slice/arch-nspawn-2501.scope/payload subcgroup: Not a directory
==> Making package: zfs-utils 2.2.2-1 (Tue Jan  2 18:11:41 2024)
==> Retrieving sources...
  -> Downloading zfs-2.2.2.tar.gz...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0 32.2M    0  2747    0     0   3400      0  2:45:46 --:--:--  2:45:46  3400
100 32.2M  100 32.2M    0     0  22.7M      0  0:00:01  0:00:01 --:--:-- 52.6M
  -> Found zfs-utils.initcpio.install
  -> Found zfs-utils.initcpio.hook
  -> Found zfs-utils.initcpio.zfsencryptssh.install
==> Validating source files with sha256sums...
    zfs-2.2.2.tar.gz ... Passed
    zfs-utils.initcpio.install ... Passed
    zfs-utils.initcpio.hook ... Passed
    zfs-utils.initcpio.zfsencryptssh.install ... Passed
Failed to create /../../devtools.slice/devtools-buildbot.slice/arch-nspawn-3334.scope/payload subcgroup: Not a directory
==> ERROR: Build failed, check /scratch/.buildroot/buildbot/build
Command returned: 1

Searching the net provides several hits of various age:

https://github.com/FFY00/build-arch-package/issues/8 https://github.com/systemd/systemd/issues/14247 https://github.com/moby/moby/issues/44402 https://serverfault.com/questions/1053187/systemd-fails-to-run-in-a-docker-container-when-using-cgroupv2-cgroupns-priva

If I configure Docker to use User Namespaces and Remapping yet a different error occurs…

I'll try to continue tomorrow…

minextu commented 11 months ago

Thank you! That is already further than I got when I last checked

techmunk commented 11 months ago

This is my current understanding of how this all works (or does not work). I might be incorrect on some points, but I do have a solution at the end.

Bind mounting /run/dbus/system_bus_socket:/run/dbus/system_bus_socket will likely never work, as this is the hosts dbus socket. So /sys/fs/cgroup on the host gets the devtools.slice, but in the container, this slice does not exist.

There are a few options here.

  1. Mount the hosts cgroup into the guest.... This is less than ideal, and not a path I'd recommend. Feels insecure as the container could mess up the host.
  2. Build a container that boots systemd (as in the whole init). This can work, but seems a bit heavy.
  3. Force systemd-nspawn to work in the same "slice" that we're already running in. This is the best option as we're already running in a private cgroup. (Both docker and podman do this by default I believe)

If we create a wrapper systemd-nspawn script with the below, and put it in the path before the default one, everything just "works".

#!/bin/bash

exec /usr/bin/systemd-nspawn --keep-unit "$@"

I've done my testing in podman, but this should all equally apply to docker. My setup can be seen at https://gist.github.com/techmunk/26e75c44745baf343b6c1d5b8e3c1576

start.sh kicks it all off. systemd-nspawn, and build are in a directory called scripts next to start.sh.

techmunk commented 11 months ago

This devtools issue from 2017 is relevant. Took me a while to find the issue again. https://bugs.archlinux.org/task/55082

UweSauter commented 11 months ago

This devtools issue from 2017 is relevant. Took me a while to find the issue again. https://bugs.archlinux.org/task/55082

This issue is what led me to bind-mount /run/dbus/system_bus_socket into the container.

I think your second point (boot the container with Systemd instead of just run a process inside) is the most clean approach. I'm still trying to figure out the Buildbot configuration in #15 but my setup with 3 "booted" systemd-nspawn, Arch Linux containers shows no issues regarding Cgroups.

Regarding you thinking that this approach is on the heavy side I'm not entirely sure what you mean by that but then again using Docker is on the heavy side as well compared to just building systemd-nspawn containers.

(And I get a PostgresQL 16 container instead of the old PostgresQL 9.6 Debian container that currently is used.)

techmunk commented 11 months ago

This devtools issue from 2017 is relevant. Took me a while to find the issue again. https://bugs.archlinux.org/task/55082

This issue is what led me to bind-mount /run/dbus/system_bus_socket into the container.

The OP still had issues when using the host bus bind mount and suggested the fix might be as simple as adding the --keep-unit argument, which does in fact seem to work. I'll accept that issue was fixing a different error that does now seem to be resolved.

I feel systemd is heavy for a container, as in general, a full init system is not really designed to run inside a container, and if there's a clean way to get it to work without it, I think that would be preferable, both from a maintenance, and resource usage perspective. My opinion of course.

In playing around with this repo, I was unable to get it to work correctly. When the worker would start, my host would be hosed (because of systemd being run inside it), and I'd have to reboot to get back control (Might be another way, did not look into it). I suspect this is because of how the container in docker shares certain cgroup/bus resources with the host. Either way, I tried tricks I had used in the past, such as setting the container env variable to something like docker, but I could not get it to work. I'd probably have to edit the build.sh script in the main repo I suspect.

I've made a pull request at https://github.com/archzfs/archzfs-ci/pull/16 which at least on my system runs a build to completion.

If running a full systemd init is desired, then a different approach would have to be taken.

UweSauter commented 10 months ago

As the build environment is working again I'll close this one.