Open zerwes opened 2 years ago
Can you confirm your molecule config looks something like the following? https://github.com/geerlingguy/ansible-role-apache/blob/master/molecule/default/molecule.yml#L7-L12
Yes. Here the relevant part from a failing example:
- name: keepalived-bionic
pre_build_image: yes
image: geerlingguy/docker-ubuntu1804-ansible:latest
privileged: true
command: /lib/systemd/systemd
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
and a tasks that enables a service via systemd fails with:
"stderr_lines": ["Failed to connect to bus: No such file or directory"]
@zerwes - Can you try changing the command to match what I have set up in mine?
Hello @geerlingguy Unfortunately makes no difference:
@@ -71,6 +71,6 @@ platforms:
pre_build_image: yes
image: geerlingguy/docker-ubuntu2004-ansible:latest
privileged: true
- command: /lib/systemd/systemd
+ command: ${MOLECULE_DOCKER_COMMAND:-""}
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
but on the invocation of systemctl
: "rc": 1, "stderr": "Failed to connect to bus: No such file or directory"
/me watches this :)
I run into the same problem as @zerwes
@stefanDeveloper is something like the docker file mentioned in the description of the issue or like https://github.com/Rosa-Luxemburgstiftung-Berlin/ansible-role-unbound/blob/main/molecule/default/Dockerfile-debian-bullseye.j2 working for you?
@zerwes you saved my week, thanks that works like a charm!
@stefanDeveloper glad to hear it helped. and maybe it helps @geerlingguy better to drill down the problem ...
I got a similar problem like @zerwes (Hi, by the way :-) ) in https://github.com/NETWAYS/ansible-role-elasticsearch/pull/53 .
As another change that might have an influence I had to remove the following lines because it made starting Elasticsearch in the containers impossible on CentOS:
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
Since I removed that, CentOS tests succeed but Debian ones fail. I put some debugging code into my roles to put out what's wrong. What I'm seeing is:
fatal: [elasticsearch-cluster2]: FAILED! => {"changed": false, "cmd": "/bin/systemctl", "msg": "Failed to connect to bus: No such file or directory", "rc": 1, "stderr": "Failed to connect to bus: No such file or directory\n", "stderr_lines": ["Failed to connect to bus: No such file or directory"], "stdout": "", "stdout_lines": []}
I suspect, both containers are built differently and what fixes problems for one breaks it for the other?
Hello @widhalmt, is something like the docker file mentioned in the description of the issue or like https://github.com/Rosa-Luxemburgstiftung-Berlin/ansible-role-unbound/blob/main/molecule/default/Dockerfile-debian-bullseye.j2 working for you?
@zerwes So you mean, disabling mounting cgroups
? As far as I understood the information from https://discuss.elastic.co/t/error-when-running-7-12-1-on-centos-7-in-docker/271508 the problem was cgroups
being mounted in two parts of the test. Looks like you disabled it in the Docker file, I disabled it in molecule.yml
. My approach did work with CentOS but not with Debian.
I use several containers by @geerlingguy so I can't easily exchange the container I'm using. I could give it a try to exchange it temporarily, though.
I'm seeing the same effect with Rocky Linux 8 now, too. After removing the mount for cgroups
in molecule.yml
CentOS 7 works again but Debian 10, Debian 11 and Rocky Linux 8 fail.
I use several containers by @geerlingguy so I can't easily exchange the container I'm using. I could give it a try to exchange it temporarily, though.
My intention is surely not to replace the widely< used docker images (therefore my docker foo is much to weak, as I consider myself in this topic just a average user), I just wanted to give @geerlingguy a hint and some help what works and what not ...
What's weird is I'm using the same containers on a ton of my projects and not (seemingly) running into the same issues that are mentioned here.
(Edit: Though I'm running them either from mac OS, or from ubuntu...)
I use several containers by @geerlingguy so I can't easily exchange the container I'm using. I could give it a try to exchange it temporarily, though.
My intention is surely not to replace the widely< used docker images (therefore my docker foo is much to weak, as I consider myself in this topic just a average user), I just wanted to give @geerlingguy a hint and some help what works and what not ...
Sorry, that was just me being unclear in my reply. I understood that you did only suppose that for tests and not to replace them completely. What I forgot to mention is, that I'm using them in a matrix check with different OS'es and I can't easily replace a single one, because it wouldn't even start. I need time to change the whole CI configuration to use the container in a test.
@geerlingguy I really don't get it either. I see the problems mostly when running them and start Elasticsearch in GitHub actions. For now it works flawlessly with CentOS 7 (when I remove mounting the cgroups
in molecule.yml
. But it breaks in Rocky Linux 8, Debian 10 and Debian 11.
I get a very similar error with failure 1 during daemon-reload: Failed to get D-Bus connection: No such file or directory
but only running molecule tests locally on mac OS. In GitHub actions the same configuration works with CentOS 7 and Rocky Linux 8. I first thought this had something to do with the docker implementation on mac OS (docker desktop vs. native docker runtime). But I'm not that sure anymore.
@tbumke - On macOS, that has to do with the implementation of cgroups v2 in Docker for Mac. I believe there's a way to work around it...
@widhalmt @zerwes apologies if I have overlooked this but which host system are you using? I ran into the same issues and decided to give up on this matter, just watching this issue.
I am trying to run this in a WSL2 on either Windows 10 or 11 resulting in Debian based containers not starting with systemd or not starting at all. Concerning this all that I have found online is that for some reason WSL2 seems to be incompatible to handle this virtualization.
If it's a Windows-Virtualization issue it would explain why it works fine on (most) MACs and Jeff's Ubuntu
@Paul-Weisser my first touch with this was running a debian 11 container on debian 11 ...
@tbumke - On macOS, that has to do with the implementation of cgroups v2 in Docker for Mac. I believe there's a way to work around it...
Thanks @geerlingguy , this pointed me in the right direction. Searching for cgroups v2 and Docker for Mac, I found this issue https://github.com/docker/for-mac/issues/6073 which also describes a workaround.
Configuring "deprecatedCgroupv1": true
(note the missing "s") in ~/Library/Group\ Containers/group.com.docker/settings.json
tells Docker for Mac to use legacy cgroups v1. This of course is only a temporary fix until Ansible Molecule supports the cgroupns
Docker parameter.
Running the container as follows and with cgroups v2 now also works in my setup:
docker run -it --privileged --cgroupns=host -v /sys/fs/cgroup:/sys/fs/cgroup:rw \
--name instance -d geerlingguy/docker-debian11-ansible
Note also, that the sysfs volume permissions need to be changed to rw
as well. Then I can successfully run systemd services and commands from the container.
Thanks @tbumke !!
Changing
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
to
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:rw
did the trick!
Now' I only have to find a way to get around a bug Elasticsearch ( https://github.com/elastic/elasticsearch/issues/74158 ) that keeps instances on multiple instances from starting because the Java Option parser print to stdout insttead of a file. But that only hits when I fire up several containers in a single test and won't keep me from proceeding with the other roles. Thank you everyone, that kept me in a constant state of rage for weeks now. :-)
Ok, guess now I'm completely lost. Now it works sometimes and sometimes it doesn't. I'll have to take a deeper look, sorry.
+1 have the same running from debian11. I believe since this image mounts cgroups into the image as a volume, it will have different results if you have different versions of cgroups in your host system. Should it work only on cgroupsv1?
also have this, anything I can provide of information to get this fixed @geerlingguy?
As I've said before, I haven't had any issues running this with systemd (for example, see my Docker role: https://github.com/geerlingguy/ansible-role-docker/blob/master/.github/workflows/ci.yml#L48 / https://github.com/geerlingguy/ansible-role-docker/runs/5959693637?check_suite_focus=true)
If someone can get a reproducible fault that works with the base image and the same kind of setup I'm using, that would be helpful.
(Another note: it seems cgroups v2 might be the main culprit for some people...)
When using the CI/CD environment (gitlab ci or any), you can use the settings for the docker daemon:
daemon.json
{
"debug": false,
"default-cgroupns-mode": "host",
"storage-driver": "vfs"
}
Thanks, @echohes . For Elasticsearch it didn't work. There's a bug that interferes with Cgroups, maybe I just have to wait for a fix. Thanks anyway. Hopefully it works for others.
@geerlingguy for using Docker on Mac it is in fact Docker Desktop I presume, right?
Looks like from the version 4.3.0 / 2021-12-02 release notes we have cgroups v2:
Docker Desktop now uses cgroupv2. If you need to run systemd in a container then:
- Ensure your version of systemd supports cgroupv2. It must be at least systemd 247. Consider upgrading any centos:7 images to centos:8.
- Containers running systemd need the following options: [--privileged --cgroupns=host -v /sys/fs/cgroup:/sys/fs/cgroup:rw] (https://serverfault.com/questions/1053187/systemd-fails-to-run-in-a-docker-container-when-using-cgroupv2-cgroupns-priva).
And from version 4.4.2 / 2022-01-13 release notes:
Added a deprecated option to settings.json: "deprecatedCgroupv1": true, which switches the Linux environment back to cgroups v1. If your software requires cgroups v1, you should update it to be compatible with cgroups v2. Although cgroups v1 should continue to work, it is likely that some future features will depend on cgroups v2. It is also possible that some Linux kernel bugs will only be fixed with cgroups v2.
I faced a problem similar to this issue but with little difference. I have systemd
running in the container and have no problem with it but the loginctl
fail with this output:
root@test-debian:/# loginctl
Failed to create bus connection: No such file or directory
my host OS is ubuntu20.04.
I faced a similar issue in MacOS with an M1 Mac. My work around was to add the following setting in the Docker Engine configuration:
"default-cgroupns-mode": "host"
I then had to change the bind mount to be rw
instead of ro
:
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:rw
Edit: Thanks @tbumke for the hints that led to getting this working.
@dataoscar but we can't configure Docker Engine of GithubActions to run molecule, or can we?
I've never had any issues with whatever docker engine GitHub Actions uses and Molecule for Ansible using @geerlingguy's Molecule config as a template. I.e., with the platforms
block looking like this,
platforms:
- name: instance
image: "geerlingguy/docker-${MOLECULE_DISTRO:-centos7}-ansible:latest"
command: ${MOLECULE_DOCKER_COMMAND:-""}
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
privileged: true
pre_build_image: true
In my setup, cgroups only caused issues on Docker Desktop for Mac.
I have a Fedora36 host and an Ubuntu 22.04 host and I get the same issue testing with https://github.com/geerlingguy/molecule-playbook-testing
The container is not actually running systemd completely.
MOLECULE_DISTRO=debian11 molecule converge
...
...
MOLECULE_DISTRO=debian11 molecule login
root@instance:/# ps faxwww
PID TTY STAT TIME COMMAND
1421 pts/0 Ss 0:00 bash
1429 pts/0 R+ 0:00 \_ ps faxwww
1 ? Ss 0:00 /lib/systemd/systemd
root@instance:/# systemctl status
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
This change/diff, taken from what @zerwes post at the top of this issue, is what fixes it for me.
# Dockerfile
...
...
-COPY initctl_faker .
-RUN chmod +x initctl_faker && rm -fr /sbin/initctl && ln -s /initctl_faker /sbin/initctl
# Install Ansible inventory file.
RUN mkdir -p /etc/ansible
RUN echo "[local]\nlocalhost ansible_connection=local" > /etc/ansible/hosts
+RUN systemctl set-default multi-user.target
-VOLUME ["/sys/fs/cgroup"]
CMD ["/lib/systemd/systemd"]
and
# molecule/default/molecule.yml
...
...
ansible-lint
platforms:
- name: instance
image: geerlingguy/docker-${MOLECULE_DISTRO:-centos8}-ansible:latest
command: ""
- volumes:
- - /sys/fs/cgroup:/sys/fs/cgroup:ro
privileged: true
pre_build_image: true
provisioner:
name: ansible
Rerunning with these changes results in
MOLECULE_DISTRO=debian11 molecule converge
...
...
MOLECULE_DISTRO=debian11 molecule login
root@instance:/# ps -faxwww
PID TTY STAT TIME COMMAND
1796 pts/0 Ss 0:00 bash
1808 pts/0 R+ 0:00 \_ ps -faxwww
1 ? Ss 0:00 /lib/systemd/systemd
25 ? Ss 0:00 /lib/systemd/systemd-journald
1719 ? Ss 0:00 /usr/sbin/apache2 -k start
1720 ? Sl 0:00 \_ /usr/sbin/apache2 -k start
1721 ? Sl 0:00 \_ /usr/sbin/apache2 -k start
root@instance:/# systemctl status
● instance
State: running
Jobs: 0 queued
Failed: 0 units
Since: Sun 2022-08-21 06:10:29 UTC; 9min ago
CGroup: /
├─init.scope
│ ├─ 1 /lib/systemd/systemd
│ ├─1796 bash
│ ├─1809 systemctl status
│ └─1810 (pager)
└─system.slice
├─apache2.service
│ ├─1719 /usr/sbin/apache2 -k start
│ ├─1720 /usr/sbin/apache2 -k start
│ └─1721 /usr/sbin/apache2 -k start
└─systemd-journald.service
└─25 /lib/systemd/systemd-journald
@aussielunix's solution works on my side. I've cloned cloned the repository, applied the edits that have been made and rebuilt a docker image I was able to use in molecule.
The service I intend to test is started correctly and molecule converge
exits with status zero.
Installation information:
I have learnt some more since I posted above.
This, from Lennart Poettering, says bind mounting /sys/fs/cgroup hierarchy is never going to work if cgroup namespaces are used
.
Notes:
override_command: false
- Found by reading source codeThis is my new molecule.yml
.
---
dependency:
name: galaxy
driver:
name: docker
lint: |
set -e
ansible-lint
platforms:
- name: instance
image: "registry.gitlab.com/aussielunix/ansible/molecule-containers/${MOLECULE_DISTRO:-debian:bullseye}"
privileged: true
pre_build_image: true
override_command: false
tmpfs:
- /run
- /tmp
provisioner:
name: ansible
log: ${MOLECULE_ANSIBLE_LOG:-true}
env:
ANSIBLE_VERBOSITY: ${MOLECULE_ANSIBLE_VERBOSITY:-0}
verifier:
name: ansible
Examples of using these:
# test with default debian:bullseye
molecule test
# test with default debian:bullseye but silence Ansible logs
MOLECULE_ANSIBLE_LOG=false molecule test
# add -vv to ansible and test with default debian:bullseye
MOLECULE_ANSIBLE_VERBOSITY=2 molecule test
# test with ubuntu:jammy
MOLECULE_DISTRO="ubuntu:jammy" molecule test
# add -vvv to ansible and test with rockylinux:9
MOLECULE_ANSIBLE_VERBOSITY=3 MOLECULE_DISTRO="rockylinux:9" molecule test
@aussielunix Thx for the investigation! The Debian/buster container works fine, but it seems like Debian/bullseye is missing in the gitlab registry: https://gitlab.com/aussielunix/ansible/molecule-containers/container_registry/3343441
@jkirk ahh the auto-pruning was set too aggressive.
I have relaxed it and triggered new containers to be built.
I was finally able to verify this issue go away. Ref: https://github.com/ansible-community/molecule/issues/3632
cgroupns_mode: host
/sys/fs/cgroup:/sys/fs/cgroup:ro
to /sys/fs/cgroup:/sys/fs/cgroup:rw
Example: https://github.com/ansible-community/molecule/pull/3665#issuecomment-1254979734
Indeed, I just noticed the update, tested it, and wrote this blog post: Docker and systemd, getting rid of dreaded 'Failed to connect to bus' error.
Hi. cgroupns_mode: host
fixes the issue with systemctl
, but other commands (like localectl
and timedatectl
) have the same problem. I get the same behavior with the rocky linux 9 container. Any suggestions are welcome
I believe y'all are getting it working, but molecule v4.0.3 doesn't seem to be enough for me - I am getting an error that cgroupns_mode
is not a supported option on community.docker.docker_container
(bolding emphasis mine). What am I missing?
TASK [Wait for instance(s) creation to complete] *** failed: [localhost] (item={'failed': 0, 'started': 1, 'finished': 0, 'ansible_job_id': '933904515236.21247', 'results_file': '/home/artis3n/.ansible_async/933904515236.21247', 'changed': True, 'item': {'cgroupns_mode': 'host', 'command': '', 'image': 'geerlingguy/docker-debian11-ansible:latest', 'name': 'instance', 'pre_build_image': True, 'privileged': True, 'volumes': ['/sys/fs/cgroup:/sys/fs/cgroup:rw']}, 'ansible_loop_var': 'item'}) => {"ansible_job_id": "933904515236.21247", "ansible_loop_var": "item", "attempts": 2, "changed": false, "finished": 1, "item": {"ansible_job_id": "933904515236.21247", "ansible_loop_var": "item", "changed": true, "failed": 0, "finished": 0, "item": {"cgroupns_mode": "host", "command": "", "image": "geerlingguy/docker-debian11-ansible:latest", "name": "instance", "pre_build_image": true, "privileged": true, "volumes": ["/sys/fs/cgroup:/sys/fs/cgroup:rw"]}, "results_file": "/home/artis3n/.ansible_async/933904515236.21247", "started": 1}, "msg": "Unsupported parameters for (community.docker.docker_container) module: cgroupns_mode. Supported parameters include: networks, privileged, read_only, security_opts, image, paused, env, publish_all_ports, cpuset_cpus, hostname, recreate, env_file, container_default_behavior, force_kill (forcekill), oom_killer, init, published_ports (ports), comparisons, cpu_quota, memory_swappiness, timeout, pull, entrypoint, ca_cert (cacert_path, tls_ca_cert), log_driver, kernel_memory, volume_driver, healthcheck, domainname, state, tls, use_ssh_client, labels, volumes, memory, stop_signal, ignore_image, auto_remove, uts, cpu_shares, debug, command, devices, restart_retries, cleanup, interactive, restart_policy, kill_signal, networks_cli_compatible, tty, restart, device_write_bps, output_logs, etc_hosts, docker_host (docker_url), memory_reservation, sysctls, memory_swap, dns_servers, detach, cpus, shm_size, keep_volumes, network_mode, volumes_from, client_cert (cert_path, tls_client_cert), cpu_period, client_key (key_path, tls_client_key), pids_limit, cgroup_parent, cap_drop, storage_opts, device_requests, removal_wait_timeout, command_handling, purge_networks, working_dir, runtime, ssl_version, api_version (docker_api_version), exposed_ports (expose, exposed), mac_address, groups, tls_hostname, validate_certs (tls_verify), links, oom_score_adj, dns_opts, default_host_ip, stop_timeout, device_write_iops, name, device_read_iops, ulimits, ipc_mode, pid_mode, mounts, userns_mode, log_options (log_opt), tmpfs, device_read_bps, capabilities, dns_search_domains, blkio_weight, cpuset_mems, user.", "results_file": "/home/artis3n/.ansible_async/933904515236.21247", "started": 1, "stderr": "/tmp/ansible_community.docker.docker_container_payload_36kjt4c1/ansible_community.docker.docker_container_payload.zip/ansible_collections/community/docker/plugins/modules/docker_container.py:1237: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n", "stderr_lines": ["/tmp/ansible_community.docker.docker_container_payload_36kjt4c1/ansible_community.docker.docker_container_payload.zip/ansible_collections/community/docker/plugins/modules/docker_container.py:1237: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead."], "stdout": "", "stdout_lines": []}
I have the following setup:
molecule.yml
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: instance
image: ${MOLECULE_DISTRO:-geerlingguy/docker-debian11-ansible:latest}
command: ${MOLECULE_DOCKER_COMMAND:-"/lib/systemd/systemd"}
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:rw
cgroupns_mode: host
privileged: true
pre_build_image: true
molecule --version
molecule 4.0.3 using python 3.10
ansible:2.14.0
delegated:4.0.3 from molecule
docker:2.1.0 from molecule_docker requiring collections: community.docker>=3.0.2 ansible.posix>=1.4.0
Poetry with pyproject.toml file:
[tool.poetry.dependencies]
python = "^3.10"
ansible = "^7.0.0"
[tool.poetry.group.dev.dependencies]
pre-commit = "^2.20.0"
ansible-lint = "^6.8.0"
molecule = {extras = ["docker"], version = "^4.0.3"}
I see cgroupns_mode
was added in community.docker
3.0.0 and Molecule is using v3.0.2
... https://docs.ansible.com/ansible/latest/collections/community/docker/docker_container_module.html#parameter-cgroupns_mode
@artis3n in my case it was also necessary to install the current community.docker
collection from Ansible Galaxy
Ahhhh yup. ansible-galaxy collection list
shows that I'm up to date globally on my system -
# /home/artis3n/.ansible/collections/ansible_collections
Collection Version
----------------- -------
...
community.general 6.1.0
but inside my Poetry env, I'm using the older version. Gotta see how to appropriately update..
/home/artis3n/.cache/pypoetry/virtualenvs/artis3n-tailscale-eXk1DDvX-py3.10/lib/python3.10/site-packages/ansible_collections
Collection Version
----------------------------- -------
...
community.general 6.0.1
...
The dependency
step wasn't in my scenario :upside_down_face: Everything's working
Ok, I'm officially confused and ready to drop the towel in favor of testing directly inside Github Runners.
The fix everybody likes only works for me for docker-debian11-ansible:latest
containers:
- name: instance
image: "geerlingguy/docker-${MOLECULE_DISTRO:-centos7}-ansible:latest"
command: ${MOLECULE_DOCKER_COMMAND:-""}
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:rw
cgroupns_mode: host
privileged: true
pre_build_image: true
It fails inside docker-centos7-ansible:latest
with
fatal: [instance]: FAILED! => {"changed": false, "msg": "Service is in unknown state", "status": {}}
On the other hand @aussielunix fix looking like this:
platforms:
- name: instance
image: "geerlingguy/docker-${MOLECULE_DISTRO:-centos7}-ansible:latest"
command: ${MOLECULE_DOCKER_COMMAND:-""}
override_command: false
tmpfs:
- /run
- /tmp
cgroupns_mode: host
privileged: true
pre_build_image: true
works for docker-centos7-ansible:latest
but fails with docker-debian11-ansible:latest
with:
Failed to connect to bus: No such file or directory
Running locally on Ubuntu 22.04 latest versions of ansible, molecule, community.general, community.docker.
Thank to @artis3n
I am running molecule 5.0.1 with ansible 2.14.5. Adding the command: ${MOLECULE_DOCKER_COMMAND:-"/lib/systemd/systemd"}
worked for me when using ubuntu2204, together with volumes, privilged mode and cgroupsns_mode
Yeah, if it is helpful to others here are all the distros I was testing and what command I had to use for everything to work smoothly:
Default if not provided is /usr/sbin/init
System
Debian 11 aka. bullseye with the debian docker.io packages. (more details later)
Description
While trying to use the image directly in
docker
or viamolecule
, the image starts, but it seems it is not systemd enabled, resulting in failed test runs.A self-brewn docker immage based on the official
debian:bullseye
works instead as expected. But to be honest, docker is really not my area of expertise...The issue seems to occur not only on the debian11 image, others like
geerlingguy/docker-centos8-ansible
,geerlingguy/docker-ubuntu2004-ansible
etc. seem affected too.Steps to reproduce
Test with own dilettantic build
Distro and Packages:
check-config