ansible-collections / community.mongodb

MongoDB Ansible Collection
http://galaxy.ansible.com/community/mongodb
GNU General Public License v3.0
108 stars 72 forks source link

Reverted to is_docker.txt method #250

Closed rhysmeister closed 3 years ago

rhysmeister commented 3 years ago
SUMMARY

Reverted to the old method of identifying when we're in a docker container by using the is_docker method. The reason being: https://github.com/ansible/ansible/issues/66304

I encountered this when moving some Travis jobs to GitHub Actions: https://github.com/rhysmeister/AutomatingMongoDBWithAnsible/

Probably need to do a tidy up. New variables mentioned in issue not yet available.

ISSUE TYPE
COMPONENT NAME

roles

rhysmeister commented 3 years ago

@cognifloyd FYI.

cognifloyd commented 3 years ago

I saw that.

What about doing the equivalent of grep '/system.slice/containerd.service' /proc/1/cgroup instead of relying on that is_docker.txt signal file? That's essentially what newer ansible versions will be doing: https://github.com/ansible/ansible/pull/72210/files#diff-48cedf55779922152790325bc5bccee4ca72603a3866e81345d47352cfcecc32

rhysmeister commented 3 years ago

Just tried this in alpine, ubuntu and centos containers. First one didn't work but perhaps something based around the second can?

/ # grep '/system.slice/containerd.service' /proc/1/cgroup
/ # echo $?
1
/ # egrep "docker|containerd" /proc/1/cgroup
14:name=systemd:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
12:pids:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
11:hugetlb:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
10:net_prio:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
9:perf_event:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
8:net_cls:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
7:freezer:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
6:devices:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
5:memory:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
4:blkio:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
3:cpuacct:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
2:cpu:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
1:cpuset:/docker/12be3caa9e2bb009f44d26f90aea6f24306b94f889d2cc45b7a69926d32e9b3c
/ # echo $?
0

It's probably a good idea to tighten up on on the egrep expression a bit to avoid false positives.

rhysmeister commented 3 years ago

Another suggestion here using .dockerenv or /run/.containerenv but does come with a warning. Perhaps worth some thought

https://stackoverflow.com/questions/23513045/how-to-check-if-a-process-is-running-inside-docker-container

rhysmeister commented 3 years ago

Perhaps not important anymore. Closing.

cognifloyd commented 3 years ago

Sorry I was busy with a new job and lost track of this.

Stepping back, why do we need to detect running in docker? Does #139 have all the reasons?

The mongod service breaks after being restarted in RedHat 8 because the pid dir is cleaned up but then cannot be recreated.

If so, maybe we can address that directly instead of trying to detect running in a container.

cognifloyd commented 3 years ago

The docker+rhel8 check is used in mongodb_mongod and mongodb_config roles, but not in mongodb_mongos. For these roles, this: 1) drops processManagement.pidFilePath from templates/*.conf.j2; 2) drops PIDFile line from systemd service file; 3) reloads systemd before starting the mongo service.

There is also an is_docker check in the mongodb_linux role. But this is more generic and needs to work more generally (ie not just on rhel8). The is_docker check disables setting: 1) sysctl: vm.zone_reclaim_mode; 2) start disable-transparent-huge-pages service.

Requiring parent playbooks to create an is_docker.txt file is not documented and would be quite irksome for playbook authors, so I want to move this into the roles somehow.

So, I think the best fix for the first issue would be to resolve the RHEL8 + Pid file issue without detecting running in a container. And, the best fix for the second issue might be to add a few tasks to determine if we're running in a container. We might have to make those tasks conditional on the version of ansible running (ansible-base 2.11+ provides more robust containerization variables; but on ansible 2.9-2.10 we'll need to do some additional manual fact sniffing)

cognifloyd commented 3 years ago

Looks like there was an old similar issue in RHEL7 that was fixed by changing the init script for mongodb 2.6.5+, 2.7.5+:

Also, it looks like the mongod.service file provided by rpm for mongo 3.4.1+ all include lines to ensure the PIDFile directory gets created:

ExecStartPre=/usr/bin/mkdir -p /var/run/mongodb
ExecStartPre=/usr/bin/chown mongod:mongod /var/run/mongodb
ExecStartPre=/usr/bin/chmod 0755 /var/run/mongodb

But the oldest version available for RHEL 8 is 3.4.24, which is the only 3.4 version available here: https://repo.mongodb.org/yum/redhat/8/mongodb-org/3.4/x86_64/RPMS/ That version includes the fix (validated by inspecting the rpm).

@rhysmeister, what was the failure in containerized RHEL 8? Does the following mean that the ExecStartPre lines were failing?

the pid dir is cleaned up but then cannot be recreated

cognifloyd commented 3 years ago

@rhysmeister Any hints on how to cause the restart failure? I'm trying to reproduce this in #348 (I canceled all actions except the roles tests), but it seems to be working.

I removed the rhel8 on docker check and reenabled restarting the service at the end (I had to fix the handlers - they were broken), but nothing seems to be failing on CentOS 8 (or anything else).

Any hints on how to reproduce the failures you were working around?

rhysmeister commented 3 years ago

Hello @cognifloyd, it's such a long time ago now I'm not too sure of the specifics. But it was basically some kind of bug that only happened in docker RH8 containers. What ever caused it could very well have been fixed ages ago and the workaround is no longer needed.

Requiring parent playbooks to create an is_docker.txt file is not documented and would be quite irksome for playbook authors, so I want to move this into the roles somehow.

This was only ever intended for the CI tests we run here. I wouldn't really expect anybody to use these roles for docker.

cognifloyd commented 3 years ago

Hopefully those tasks aren't needed anymore. If they are, then I would like to rename them to say "check for is_docker signal file (CI only)" or something similar that accurately describes what the task is doing, as these tasks are visible outside of CI.

Anyway, I'm running a test without any of the is_docker stuff to see if I can safely replace that in a way that works with ansible 2.9

cognifloyd commented 3 years ago

Well, looks like even the latest versions of ansible would have issues detecting docker in GHA. https://github.com/ansible/ansible/pull/74349

/.dockerenv looks like a good thing to check for, I just changed #348 to use that in the mongodb_linux role.

cognifloyd commented 3 years ago

348 merged. Closing. We should file a new issue if any new issues pop up with the docker detection.