using this on debian 11 aka. bullseye is resulting in a non-systemd

zerwes commented 2 years ago

System

Debian 11 aka. bullseye with the debian docker.io packages. (more details later)

Description

While trying to use the image directly in docker or via molecule, the image starts, but it seems it is not systemd enabled, resulting in failed test runs.

A self-brewn docker immage based on the official debian:bullseye works instead as expected. But to be honest, docker is really not my area of expertise...

The issue seems to occur not only on the debian11 image, others like geerlingguy/docker-centos8-ansible, geerlingguy/docker-ubuntu2004-ansibleetc. seem affected too.

Steps to reproduce

$ docker run --detach --privileged --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro geerlingguy/docker-debian11-ansible:latest
0c103204a41a3dd1487ab70813ac5fd4480f3f9e904f70cbe0c8a2b02443d986
$ docker ps
CONTAINER ID   IMAGE                                        COMMAND                  CREATED          STATUS          PORTS     NAMES
0c103204a41a   geerlingguy/docker-debian11-ansible:latest   "/lib/systemd/systemd"   39 seconds ago   Up 38 seconds             jovial_wilson
$ docker exec --tty 0c103204a41a /bin/systemctl status
Failed to connect to bus: No such file or directory

Test with own dilettantic build

$ cat Dockerfile 

FROM debian:bullseye

ENV container docker
ENV LC_ALL C
ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update \
    && apt-get install -y python3 sudo bash ca-certificates iproute2 python3-apt aptitude systemd systemd-sysv \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

RUN rm -f /lib/systemd/system/multi-user.target.wants/* \
    /etc/systemd/system/*.wants/* \
    /lib/systemd/system/local-fs.target.wants/* \
    /lib/systemd/system/sockets.target.wants/*udev* \
    /lib/systemd/system/sockets.target.wants/*initctl* \
    /lib/systemd/system/sysinit.target.wants/systemd-tmpfiles-setup* \
    /lib/systemd/system/systemd-update-utmp*

RUN systemctl set-default multi-user.target

#VOLUME [ "/sys/fs/cgroup" ]

CMD [ "/lib/systemd/systemd", "log-level=info", "unit=sysinit.target" ]

$ docker build .
Sending build context to Docker daemon  3.072kB
Step 1/8 : FROM debian:bullseye
...
Successfully built eb8ff56c63ab

$ docker tag eb8ff56c63ab test-deb11-systemd

$ docker  run --detach --privileged  --name test-deb11-systemd test-deb11-systemd
7b0afaa24585c10a5ddcab18c0b1d06aef23501282dc0e8918e505784862a2a8

$ docker exec --tty 7b0afaa24585c10a5ddcab18c0b1d06aef23501282dc0e8918e505784862a2a8 /bin/systemctl status
* 7b0afaa24585
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Fri 2022-01-21 21:50:32 UTC; 6s ago
   CGroup: /
           |-init.scope 
           | |- 1 /lib/systemd/systemd log-level=info unit=sysinit.target
           | |-35 /bin/systemctl status
           | `-42 (pager)
           `-system.slice 
             `-systemd-journald.service 
               `-26 /lib/systemd/systemd-journald

Distro and Packages:

Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:    11
Codename:   bullseye

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                      Version                 Architecture Description
+++-=========================-=======================-============-=====================================================
ii  docker                    1.5-2                   all          transitional package
ii  docker.io                 20.10.5+dfsg1-1+deb11u1 amd64        Linux container runtime
ii  python3-docker            4.1.0-1.2               all          Python 3 wrapper to access docker.io's control socket

check-config

$ /usr/share/docker.io/contrib/check-config.sh
warning: /proc/config.gz does not exist, searching other paths for kernel config ...
info: reading kernel config from /boot/config-5.10.0-10-amd64 ...

Generally Necessary:
- cgroup hierarchy: cgroupv2
- apparmor: enabled, but apparmor_parser missing
    (use "apt-get install apparmor" to fix this)
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_NETFILTER_XT_MARK: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_LEGACY_VSYSCALL_NONE: enabled
    (containers using eglibc <= 2.13 will not work. Switch to
     "CONFIG_VSYSCALL_[NATIVE|EMULATE]" or use "vsyscall=[native|emulate]"
     on kernel command line. Note that this will disable ASLR for the,
     VDSO which may assist in exploiting security vulnerabilities.)
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled (as module)
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
    - CONFIG_BRIDGE_VLAN_FILTERING: enabled
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled (as module)
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled (as module)
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
  - "ipvlan":
    - CONFIG_IPVLAN: enabled (as module)
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: enabled (as module)
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: missing
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled (as module)
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

geerlingguy commented 2 years ago

Can you confirm your molecule config looks something like the following? https://github.com/geerlingguy/ansible-role-apache/blob/master/molecule/default/molecule.yml#L7-L12

zerwes commented 2 years ago

Yes. Here the relevant part from a failing example:

  - name: keepalived-bionic
    pre_build_image: yes
    image: geerlingguy/docker-ubuntu1804-ansible:latest
    privileged: true
    command: /lib/systemd/systemd
    volumes:
    - /sys/fs/cgroup:/sys/fs/cgroup:ro

and a tasks that enables a service via systemd fails with:

"stderr_lines": ["Failed to connect to bus: No such file or directory"]

geerlingguy commented 2 years ago

@zerwes - Can you try changing the command to match what I have set up in mine?

zerwes commented 2 years ago

Hello @geerlingguy Unfortunately makes no difference:

@@ -71,6 +71,6 @@ platforms:
     pre_build_image: yes
     image: geerlingguy/docker-ubuntu2004-ansible:latest
     privileged: true
-    command: /lib/systemd/systemd
+    command: ${MOLECULE_DOCKER_COMMAND:-""}
     volumes:
     - /sys/fs/cgroup:/sys/fs/cgroup:ro

but on the invocation of systemctl: "rc": 1, "stderr": "Failed to connect to bus: No such file or directory"

evrardjp commented 2 years ago

/me watches this :)

stefanDeveloper commented 2 years ago

I run into the same problem as @zerwes

zerwes commented 2 years ago

@stefanDeveloper is something like the docker file mentioned in the description of the issue or like https://github.com/Rosa-Luxemburgstiftung-Berlin/ansible-role-unbound/blob/main/molecule/default/Dockerfile-debian-bullseye.j2 working for you?

stefanDeveloper commented 2 years ago

@zerwes you saved my week, thanks that works like a charm!

zerwes commented 2 years ago

@stefanDeveloper glad to hear it helped. and maybe it helps @geerlingguy better to drill down the problem ...

widhalmt commented 2 years ago

I got a similar problem like @zerwes (Hi, by the way :-) ) in https://github.com/NETWAYS/ansible-role-elasticsearch/pull/53 .

As another change that might have an influence I had to remove the following lines because it made starting Elasticsearch in the containers impossible on CentOS:

     volumes:
     - /sys/fs/cgroup:/sys/fs/cgroup:ro

Since I removed that, CentOS tests succeed but Debian ones fail. I put some debugging code into my roles to put out what's wrong. What I'm seeing is:

  fatal: [elasticsearch-cluster2]: FAILED! => {"changed": false, "cmd": "/bin/systemctl", "msg": "Failed to connect to bus: No such file or directory", "rc": 1, "stderr": "Failed to connect to bus: No such file or directory\n", "stderr_lines": ["Failed to connect to bus: No such file or directory"], "stdout": "", "stdout_lines": []}

I suspect, both containers are built differently and what fixes problems for one breaks it for the other?

zerwes commented 2 years ago

Hello @widhalmt, is something like the docker file mentioned in the description of the issue or like https://github.com/Rosa-Luxemburgstiftung-Berlin/ansible-role-unbound/blob/main/molecule/default/Dockerfile-debian-bullseye.j2 working for you?

widhalmt commented 2 years ago

@zerwes So you mean, disabling mounting cgroups? As far as I understood the information from https://discuss.elastic.co/t/error-when-running-7-12-1-on-centos-7-in-docker/271508 the problem was cgroups being mounted in two parts of the test. Looks like you disabled it in the Docker file, I disabled it in molecule.yml. My approach did work with CentOS but not with Debian.

I use several containers by @geerlingguy so I can't easily exchange the container I'm using. I could give it a try to exchange it temporarily, though.

widhalmt commented 2 years ago

I'm seeing the same effect with Rocky Linux 8 now, too. After removing the mount for cgroups in molecule.yml CentOS 7 works again but Debian 10, Debian 11 and Rocky Linux 8 fail.

zerwes commented 2 years ago

I use several containers by @geerlingguy so I can't easily exchange the container I'm using. I could give it a try to exchange it temporarily, though.

My intention is surely not to replace the widely< used docker images (therefore my docker foo is much to weak, as I consider myself in this topic just a average user), I just wanted to give @geerlingguy a hint and some help what works and what not ...

geerlingguy commented 2 years ago

What's weird is I'm using the same containers on a ton of my projects and not (seemingly) running into the same issues that are mentioned here.

(Edit: Though I'm running them either from mac OS, or from ubuntu...)

widhalmt commented 2 years ago

I use several containers by @geerlingguy so I can't easily exchange the container I'm using. I could give it a try to exchange it temporarily, though.

My intention is surely not to replace the widely< used docker images (therefore my docker foo is much to weak, as I consider myself in this topic just a average user), I just wanted to give @geerlingguy a hint and some help what works and what not ...

Sorry, that was just me being unclear in my reply. I understood that you did only suppose that for tests and not to replace them completely. What I forgot to mention is, that I'm using them in a matrix check with different OS'es and I can't easily replace a single one, because it wouldn't even start. I need time to change the whole CI configuration to use the container in a test.

widhalmt commented 2 years ago

@geerlingguy I really don't get it either. I see the problems mostly when running them and start Elasticsearch in GitHub actions. For now it works flawlessly with CentOS 7 (when I remove mounting the cgroups in molecule.yml. But it breaks in Rocky Linux 8, Debian 10 and Debian 11.

tbumke commented 2 years ago

I get a very similar error with failure 1 during daemon-reload: Failed to get D-Bus connection: No such file or directory but only running molecule tests locally on mac OS. In GitHub actions the same configuration works with CentOS 7 and Rocky Linux 8. I first thought this had something to do with the docker implementation on mac OS (docker desktop vs. native docker runtime). But I'm not that sure anymore.

geerlingguy commented 2 years ago

@tbumke - On macOS, that has to do with the implementation of cgroups v2 in Docker for Mac. I believe there's a way to work around it...

Paul-Weisser commented 2 years ago

@widhalmt @zerwes apologies if I have overlooked this but which host system are you using? I ran into the same issues and decided to give up on this matter, just watching this issue.

I am trying to run this in a WSL2 on either Windows 10 or 11 resulting in Debian based containers not starting with systemd or not starting at all. Concerning this all that I have found online is that for some reason WSL2 seems to be incompatible to handle this virtualization.

If it's a Windows-Virtualization issue it would explain why it works fine on (most) MACs and Jeff's Ubuntu

zerwes commented 2 years ago

@Paul-Weisser my first touch with this was running a debian 11 container on debian 11 ...

tbumke commented 2 years ago

@tbumke - On macOS, that has to do with the implementation of cgroups v2 in Docker for Mac. I believe there's a way to work around it...

Thanks @geerlingguy , this pointed me in the right direction. Searching for cgroups v2 and Docker for Mac, I found this issue https://github.com/docker/for-mac/issues/6073 which also describes a workaround.

Configuring "deprecatedCgroupv1": true (note the missing "s") in ~/Library/Group\ Containers/group.com.docker/settings.json tells Docker for Mac to use legacy cgroups v1. This of course is only a temporary fix until Ansible Molecule supports the cgroupns Docker parameter.

Running the container as follows and with cgroups v2 now also works in my setup:

docker run -it --privileged --cgroupns=host -v /sys/fs/cgroup:/sys/fs/cgroup:rw \
  --name instance -d geerlingguy/docker-debian11-ansible

Note also, that the sysfs volume permissions need to be changed to rw as well. Then I can successfully run systemd services and commands from the container.

widhalmt commented 2 years ago

Thanks @tbumke !!

Changing

    volumes:
    - /sys/fs/cgroup:/sys/fs/cgroup:ro

to

    volumes:
    - /sys/fs/cgroup:/sys/fs/cgroup:rw

did the trick!

Now' I only have to find a way to get around a bug Elasticsearch ( https://github.com/elastic/elasticsearch/issues/74158 ) that keeps instances on multiple instances from starting because the Java Option parser print to stdout insttead of a file. But that only hits when I fire up several containers in a single test and won't keep me from proceeding with the other roles. Thank you everyone, that kept me in a constant state of rage for weeks now. :-)

widhalmt commented 2 years ago

Ok, guess now I'm completely lost. Now it works sometimes and sometimes it doesn't. I'll have to take a deeper look, sorry.

staticdev commented 2 years ago

+1 have the same running from debian11. I believe since this image mounts cgroups into the image as a volume, it will have different results if you have different versions of cgroups in your host system. Should it work only on cgroupsv1?

barrelful commented 2 years ago

also have this, anything I can provide of information to get this fixed @geerlingguy?

geerlingguy commented 2 years ago

As I've said before, I haven't had any issues running this with systemd (for example, see my Docker role: https://github.com/geerlingguy/ansible-role-docker/blob/master/.github/workflows/ci.yml#L48 / https://github.com/geerlingguy/ansible-role-docker/runs/5959693637?check_suite_focus=true)

If someone can get a reproducible fault that works with the base image and the same kind of setup I'm using, that would be helpful.

(Another note: it seems cgroups v2 might be the main culprit for some people...)

echohes commented 2 years ago

When using the CI/CD environment (gitlab ci or any), you can use the settings for the docker daemon:

daemon.json
{
  "debug": false,
  "default-cgroupns-mode": "host",
  "storage-driver": "vfs"
}

widhalmt commented 2 years ago

Thanks, @echohes . For Elasticsearch it didn't work. There's a bug that interferes with Cgroups, maybe I just have to wait for a fix. Thanks anyway. Hopefully it works for others.

staticdev commented 2 years ago

@geerlingguy for using Docker on Mac it is in fact Docker Desktop I presume, right?

Looks like from the version 4.3.0 / 2021-12-02 release notes we have cgroups v2:

Docker Desktop now uses cgroupv2. If you need to run systemd in a container then:

Ensure your version of systemd supports cgroupv2. It must be at least systemd 247. Consider upgrading any centos:7 images to centos:8.

Containers running systemd need the following options: [--privileged --cgroupns=host -v /sys/fs/cgroup:/sys/fs/cgroup:rw] (https://serverfault.com/questions/1053187/systemd-fails-to-run-in-a-docker-container-when-using-cgroupv2-cgroupns-priva).

And from version 4.4.2 / 2022-01-13 release notes:

Added a deprecated option to settings.json: "deprecatedCgroupv1": true, which switches the Linux environment back to cgroups v1. If your software requires cgroups v1, you should update it to be compatible with cgroups v2. Although cgroups v1 should continue to work, it is likely that some future features will depend on cgroups v2. It is also possible that some Linux kernel bugs will only be fixed with cgroups v2.

mhdan commented 2 years ago

I faced a problem similar to this issue but with little difference. I have systemd running in the container and have no problem with it but the loginctl fail with this output:

root@test-debian:/# loginctl
Failed to create bus connection: No such file or directory

my host OS is ubuntu20.04.

dataoscar commented 2 years ago

I faced a similar issue in MacOS with an M1 Mac. My work around was to add the following setting in the Docker Engine configuration:

"default-cgroupns-mode": "host"

I then had to change the bind mount to be rw instead of ro:

    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:rw

Edit: Thanks @tbumke for the hints that led to getting this working.

staticdev commented 2 years ago

@dataoscar but we can't configure Docker Engine of GithubActions to run molecule, or can we?

tbumke commented 2 years ago

I've never had any issues with whatever docker engine GitHub Actions uses and Molecule for Ansible using @geerlingguy's Molecule config as a template. I.e., with the platforms block looking like this,

platforms:
  - name: instance
    image: "geerlingguy/docker-${MOLECULE_DISTRO:-centos7}-ansible:latest"
    command: ${MOLECULE_DOCKER_COMMAND:-""}
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: true

In my setup, cgroups only caused issues on Docker Desktop for Mac.

aussielunix commented 2 years ago

I have a Fedora36 host and an Ubuntu 22.04 host and I get the same issue testing with https://github.com/geerlingguy/molecule-playbook-testing

The container is not actually running systemd completely.

MOLECULE_DISTRO=debian11 molecule converge
...
...
MOLECULE_DISTRO=debian11 molecule login
root@instance:/# ps faxwww
    PID TTY      STAT   TIME COMMAND
   1421 pts/0    Ss     0:00 bash
   1429 pts/0    R+     0:00  \_ ps faxwww
      1 ?        Ss     0:00 /lib/systemd/systemd
root@instance:/# systemctl status 
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down

This change/diff, taken from what @zerwes post at the top of this issue, is what fixes it for me.

# Dockerfile
...
...
-COPY initctl_faker .
-RUN chmod +x initctl_faker && rm -fr /sbin/initctl && ln -s /initctl_faker /sbin/initctl

 # Install Ansible inventory file.
 RUN mkdir -p /etc/ansible
 RUN echo "[local]\nlocalhost ansible_connection=local" > /etc/ansible/hosts

+RUN systemctl set-default multi-user.target

-VOLUME ["/sys/fs/cgroup"]
 CMD ["/lib/systemd/systemd"]

and

# molecule/default/molecule.yml
...
...
   ansible-lint
 platforms:
   - name: instance
     image: geerlingguy/docker-${MOLECULE_DISTRO:-centos8}-ansible:latest
     command: ""
-    volumes:
-      - /sys/fs/cgroup:/sys/fs/cgroup:ro
     privileged: true
     pre_build_image: true
 provisioner:
   name: ansible

Rerunning with these changes results in

MOLECULE_DISTRO=debian11 molecule converge
...
...
MOLECULE_DISTRO=debian11 molecule login
root@instance:/# ps -faxwww 
    PID TTY      STAT   TIME COMMAND
   1796 pts/0    Ss     0:00 bash
   1808 pts/0    R+     0:00  \_ ps -faxwww
      1 ?        Ss     0:00 /lib/systemd/systemd
     25 ?        Ss     0:00 /lib/systemd/systemd-journald
   1719 ?        Ss     0:00 /usr/sbin/apache2 -k start
   1720 ?        Sl     0:00  \_ /usr/sbin/apache2 -k start
   1721 ?        Sl     0:00  \_ /usr/sbin/apache2 -k start
root@instance:/# systemctl status
● instance
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Sun 2022-08-21 06:10:29 UTC; 9min ago
   CGroup: /
           ├─init.scope 
           │ ├─   1 /lib/systemd/systemd
           │ ├─1796 bash
           │ ├─1809 systemctl status
           │ └─1810 (pager)
           └─system.slice 
             ├─apache2.service 
             │ ├─1719 /usr/sbin/apache2 -k start
             │ ├─1720 /usr/sbin/apache2 -k start
             │ └─1721 /usr/sbin/apache2 -k start
             └─systemd-journald.service 
               └─25 /lib/systemd/systemd-journald

lapin-b commented 2 years ago

@aussielunix's solution works on my side. I've cloned cloned the repository, applied the edits that have been made and rebuilt a docker image I was able to use in molecule.

The service I intend to test is started correctly and molecule converge exits with status zero.

Installation information:

Fedora 35 with (probably) cgroup v2 (no kernel parameter related to that)
Vanilla Docker installation 20.10.17
Ansible 2.9.27 (from dnf)
Molecule 3.5.2 (from dnf)

aussielunix commented 2 years ago

I have learnt some more since I posted above.

This, from Lennart Poettering, says bind mounting /sys/fs/cgroup hierarchy is never going to work if cgroup namespaces are used.

Notes:

no overriding the container command needed
- override_command: false - Found by reading source code
no volumes/mounts
still needs privileged mode
Dockerfiles for my containers are at https://gitlab.com/aussielunix/ansible/molecule-containers

This is my new molecule.yml.

---

dependency:
  name: galaxy
driver:
  name: docker
lint: |
  set -e
  ansible-lint
platforms:
  - name: instance
    image: "registry.gitlab.com/aussielunix/ansible/molecule-containers/${MOLECULE_DISTRO:-debian:bullseye}"
    privileged: true
    pre_build_image: true
    override_command: false
    tmpfs:
      - /run
      - /tmp
provisioner:
  name: ansible
  log: ${MOLECULE_ANSIBLE_LOG:-true}
  env:
    ANSIBLE_VERBOSITY: ${MOLECULE_ANSIBLE_VERBOSITY:-0}
verifier:
  name: ansible

Examples of using these:

# test with default debian:bullseye
molecule test

# test with default debian:bullseye but silence Ansible logs
MOLECULE_ANSIBLE_LOG=false molecule test

# add -vv to ansible and test with default debian:bullseye
MOLECULE_ANSIBLE_VERBOSITY=2 molecule test

# test with ubuntu:jammy
MOLECULE_DISTRO="ubuntu:jammy" molecule test

# add -vvv to ansible and test with rockylinux:9
MOLECULE_ANSIBLE_VERBOSITY=3 MOLECULE_DISTRO="rockylinux:9" molecule test

jkirk commented 2 years ago

@aussielunix Thx for the investigation! The Debian/buster container works fine, but it seems like Debian/bullseye is missing in the gitlab registry: https://gitlab.com/aussielunix/ansible/molecule-containers/container_registry/3343441

aussielunix commented 2 years ago

@jkirk ahh the auto-pruning was set too aggressive.
I have relaxed it and triggered new containers to be built.

staticdev commented 2 years ago

I was finally able to verify this issue go away. Ref: https://github.com/ansible-community/molecule/issues/3632

Upgrade to molecule 4.0.3
Add cgroupns_mode: host
Change /sys/fs/cgroup:/sys/fs/cgroup:ro to /sys/fs/cgroup:/sys/fs/cgroup:rw

Example: https://github.com/ansible-community/molecule/pull/3665#issuecomment-1254979734

geerlingguy commented 2 years ago

Indeed, I just noticed the update, tested it, and wrote this blog post: Docker and systemd, getting rid of dreaded 'Failed to connect to bus' error.

alecunsolo commented 1 year ago

Hi. cgroupns_mode: host fixes the issue with systemctl, but other commands (like localectl and timedatectl) have the same problem. I get the same behavior with the rocky linux 9 container. Any suggestions are welcome

artis3n commented 1 year ago

I believe y'all are getting it working, but molecule v4.0.3 doesn't seem to be enough for me - I am getting an error that cgroupns_mode is not a supported option on community.docker.docker_container (bolding emphasis mine). What am I missing?

TASK [Wait for instance(s) creation to complete] *** failed: [localhost] (item={'failed': 0, 'started': 1, 'finished': 0, 'ansible_job_id': '933904515236.21247', 'results_file': '/home/artis3n/.ansible_async/933904515236.21247', 'changed': True, 'item': {'cgroupns_mode': 'host', 'command': '', 'image': 'geerlingguy/docker-debian11-ansible:latest', 'name': 'instance', 'pre_build_image': True, 'privileged': True, 'volumes': ['/sys/fs/cgroup:/sys/fs/cgroup:rw']}, 'ansible_loop_var': 'item'}) => {"ansible_job_id": "933904515236.21247", "ansible_loop_var": "item", "attempts": 2, "changed": false, "finished": 1, "item": {"ansible_job_id": "933904515236.21247", "ansible_loop_var": "item", "changed": true, "failed": 0, "finished": 0, "item": {"cgroupns_mode": "host", "command": "", "image": "geerlingguy/docker-debian11-ansible:latest", "name": "instance", "pre_build_image": true, "privileged": true, "volumes": ["/sys/fs/cgroup:/sys/fs/cgroup:rw"]}, "results_file": "/home/artis3n/.ansible_async/933904515236.21247", "started": 1}, "msg": "Unsupported parameters for (community.docker.docker_container) module: cgroupns_mode. Supported parameters include: networks, privileged, read_only, security_opts, image, paused, env, publish_all_ports, cpuset_cpus, hostname, recreate, env_file, container_default_behavior, force_kill (forcekill), oom_killer, init, published_ports (ports), comparisons, cpu_quota, memory_swappiness, timeout, pull, entrypoint, ca_cert (cacert_path, tls_ca_cert), log_driver, kernel_memory, volume_driver, healthcheck, domainname, state, tls, use_ssh_client, labels, volumes, memory, stop_signal, ignore_image, auto_remove, uts, cpu_shares, debug, command, devices, restart_retries, cleanup, interactive, restart_policy, kill_signal, networks_cli_compatible, tty, restart, device_write_bps, output_logs, etc_hosts, docker_host (docker_url), memory_reservation, sysctls, memory_swap, dns_servers, detach, cpus, shm_size, keep_volumes, network_mode, volumes_from, client_cert (cert_path, tls_client_cert), cpu_period, client_key (key_path, tls_client_key), pids_limit, cgroup_parent, cap_drop, storage_opts, device_requests, removal_wait_timeout, command_handling, purge_networks, working_dir, runtime, ssl_version, api_version (docker_api_version), exposed_ports (expose, exposed), mac_address, groups, tls_hostname, validate_certs (tls_verify), links, oom_score_adj, dns_opts, default_host_ip, stop_timeout, device_write_iops, name, device_read_iops, ulimits, ipc_mode, pid_mode, mounts, userns_mode, log_options (log_opt), tmpfs, device_read_bps, capabilities, dns_search_domains, blkio_weight, cpuset_mems, user.", "results_file": "/home/artis3n/.ansible_async/933904515236.21247", "started": 1, "stderr": "/tmp/ansible_community.docker.docker_container_payload_36kjt4c1/ansible_community.docker.docker_container_payload.zip/ansible_collections/community/docker/plugins/modules/docker_container.py:1237: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n", "stderr_lines": ["/tmp/ansible_community.docker.docker_container_payload_36kjt4c1/ansible_community.docker.docker_container_payload.zip/ansible_collections/community/docker/plugins/modules/docker_container.py:1237: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead."], "stdout": "", "stdout_lines": []}

I have the following setup:

molecule.yml

dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: instance
    image: ${MOLECULE_DISTRO:-geerlingguy/docker-debian11-ansible:latest}
    command: ${MOLECULE_DOCKER_COMMAND:-"/lib/systemd/systemd"}
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:rw
    cgroupns_mode: host
    privileged: true
    pre_build_image: true

molecule --version

molecule 4.0.3 using python 3.10 
    ansible:2.14.0
    delegated:4.0.3 from molecule
    docker:2.1.0 from molecule_docker requiring collections: community.docker>=3.0.2 ansible.posix>=1.4.0

Poetry with pyproject.toml file:

[tool.poetry.dependencies]
python = "^3.10"
ansible = "^7.0.0"

[tool.poetry.group.dev.dependencies]
pre-commit = "^2.20.0"
ansible-lint = "^6.8.0"
molecule = {extras = ["docker"], version = "^4.0.3"}

I see cgroupns_mode was added in community.docker 3.0.0 and Molecule is using v3.0.2... https://docs.ansible.com/ansible/latest/collections/community/docker/docker_container_module.html#parameter-cgroupns_mode

Galaxy102 commented 1 year ago

@artis3n in my case it was also necessary to install the current community.docker collection from Ansible Galaxy

artis3n commented 1 year ago

Ahhhh yup. ansible-galaxy collection list shows that I'm up to date globally on my system -

# /home/artis3n/.ansible/collections/ansible_collections
Collection        Version
----------------- -------
...
community.general 6.1.0

but inside my Poetry env, I'm using the older version. Gotta see how to appropriately update..

 /home/artis3n/.cache/pypoetry/virtualenvs/artis3n-tailscale-eXk1DDvX-py3.10/lib/python3.10/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
...
community.general             6.0.1  
...

artis3n commented 1 year ago

The dependency step wasn't in my scenario :upside_down_face: Everything's working

filviu commented 1 year ago

Ok, I'm officially confused and ready to drop the towel in favor of testing directly inside Github Runners.

The fix everybody likes only works for me for docker-debian11-ansible:latest containers:

  - name: instance
    image: "geerlingguy/docker-${MOLECULE_DISTRO:-centos7}-ansible:latest"
    command: ${MOLECULE_DOCKER_COMMAND:-""}
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:rw
    cgroupns_mode: host
    privileged: true
    pre_build_image: true

It fails inside docker-centos7-ansible:latest with

fatal: [instance]: FAILED! => {"changed": false, "msg": "Service is in unknown state", "status": {}}

On the other hand @aussielunix fix looking like this:

platforms:
  - name: instance
    image: "geerlingguy/docker-${MOLECULE_DISTRO:-centos7}-ansible:latest"
    command: ${MOLECULE_DOCKER_COMMAND:-""}
    override_command: false
    tmpfs:
      - /run
      - /tmp
    cgroupns_mode: host
    privileged: true
    pre_build_image: true

works for docker-centos7-ansible:latest but fails with docker-debian11-ansible:latest with:

Failed to connect to bus: No such file or directory

Running locally on Ubuntu 22.04 latest versions of ansible, molecule, community.general, community.docker.

darsh12 commented 1 year ago

Thank to @artis3n I am running molecule 5.0.1 with ansible 2.14.5. Adding the command: ${MOLECULE_DOCKER_COMMAND:-"/lib/systemd/systemd"} worked for me when using ubuntu2204, together with volumes, privilged mode and cgroupsns_mode

artis3n commented 1 year ago

Yeah, if it is helpful to others here are all the distros I was testing and what command I had to use for everything to work smoothly:

https://github.com/artis3n/ansible-role-tailscale/blob/7e9907a606df08ce79fa675bbf45c59be97a1e9b/.github/workflows/pull_request_target.yml#L24-L44

and https://github.com/artis3n/ansible-role-tailscale/blob/7e9907a606df08ce79fa675bbf45c59be97a1e9b/.github/workflows/pull_request_target.yml#L59-L65

Default if not provided is /usr/sbin/init

geerlingguy / docker-debian11-ansible