Open dchsueh opened 6 years ago
I have taken a look on your issue , using POC repo from other issue as a base:
https://github.com/Nexlo/ansible-test
extending it to be double role:
services:
web:
from: ubuntu:14.04
roles:
- role: role-2
gather_facts: no
- role: my-new-role
gather_facts: no
on a clear virtual env (py2 , base os ubuntu 16.04 LTS), without mentioned Dockerfile - works like a charm for me. This makes me think that issue might be not in ansible-container, but your environment (i.e. combination of dockerized ansible-container + conductor + container)
Perhaps you can create POC repository for issue, using above https://github.com/Nexlo/ansible-test as basis ?
as an option - try to build with --no-container-cache i.e.
ansible-container build --no-container-cache --services base --roles-path <wherever you put the roles>
If it get's better , please comment here
Voronenko, I do appreciate you looking at this. (At this time it seems the support that ansible-container users might get post-https://github.com/ansible/ansible-container/commit/2fa778a7c8d1699672314ac0b89c53554f435cb7 is ourselves!)
My original writeup is reeeeely long and the working/not-working scenarios are buried in too much other text:
I agree that the main factor is the dockerized ansible-container setup. The thing that strikes me as very strange is that the dockerized configuration was working fine for the month or two that I was using it successfully before approx June 5. And on the server that was working previously, the roles that now cannot be found are the roles I added after June 5; all the previously found and previously working roles still work.
Would you mind trying running an ansible-container build in a container? Here's a minimal ubuntu:xenial Dockerfile that should run ansible-container successfully (mount in /var/run/docker.sock and your ansible code):
FROM ubuntu:xenial
WORKDIR /var/tmp
RUN apt-get -y update \
&& apt-get -y install curl python less
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py \
&& python get-pip.py \
&& pip --disable-pip-version-check install pip==9.0.3 \
&& pip --disable-pip-version-check install setuptools==39.2.0 \
&& pip --disable-pip-version-check install docker==2.7.0 \
&& pip --disable-pip-version-check install ansible-container[docker]==0.9.2 \
&& sed -i "s/filters={'name': self.secrets_volume_name}//g" /usr/local/lib/python2.7/dist-packages/container/docker/secrets.py \
&& sed -i "s/return os.path.join(os.sep, 'run', 'secrets')/return os.path.join(os.sep, 'docker', 'secrets')/g" /usr/local/lib/python2.7/dist-packages/container/docker/engine.py \
&& true
# sed filters addresses ansible-container bug described in https://github.com/moby/moby/issues/34121
# sed return is workaround for https://github.com/ansible/ansible-container/issues/762
RUN curl https://get.docker.com/builds/Linux/x86_64/docker-17.04.0-ce.tgz | tar -zxC /usr/local/bin/ --strip-components=1 docker/docker
pip freeze output in both in-virtualenv working and global-env nonworking ubuntu situations is:
$ pip freeze
ansible-container==0.9.2
backports.ssl-match-hostname==3.5.0.1
certifi==2018.4.16
chardet==3.0.4
colorama==0.3.9
docker==2.7.0
docker-pycreds==0.3.0
idna==2.7
ipaddress==1.0.22
Jinja2==2.10
MarkupSafe==1.0
PyYAML==3.12
requests==2.19.0
ruamel.ordereddict==0.4.13
ruamel.yaml==0.15.38
six==1.11.0
structlog==18.1.0
urllib3==1.23
websocket-client==0.48.0
-- edit: changed dockerfile from centos:7 to ubuntu:xenial
From one hand I confirm the issue (i.e. in some circumstances role not found, if mapped to other path than on original host), from other hand whole approach is erroneous:
1) You bind docker sock from (unknown) docker version - i.e. only you know it
2) From other hand, you install very specific (and potentially incompatible with that sock) version of the docker inside container RUN curl https://get.docker.com/builds/Linux/x86_64/docker-17.04.0-ce.tgz
i.e. summary at that point - I would not do in that way.... and instead go with local python with ansible-container in virtual env
3) build for sure happens on the target host , i.e. if you map your working folder into exactly same location, i.e. kind of
-v /home/slavko/tmp/ansible-test:/home/slavko/tmp/ansible-test \
and not
-v /home/slavko/tmp/ansible-test:/app \
docker process starts to find mentioned roles and even tries to build.
I would not do building docker from docker with mapped sock. Using TCP port ? who knows - seems more reliable, at least it will send context there.
Hope that helps
your suggestions and analysis give me some good ideas on investigating a workaround or alternate approaches
I'll report back if anything ends up successful
(the idea of curl-ing the docker binary directly into the image comes from how the conductor images are created - "docker history --no-trunc ansible/container-conductor-centos-7:0.9.2")
thank you
Your comment about conductor is right. So this is rather api compability.
ISSUE TYPE
container.yml
This is a reasonably small example I created to demonstrate the problem. (Yes it fails.)
Individual roles have a tasks/main.yml of the form
substitute BASE for ONE, TWO, THREE, FOUR to match role
OS / ENVIRONMENT
The environment for a virtualenv ansible-container install direct on ubuntu xenial:
Believed-identical environment configured as a Dockerfile-built docker container "FROM ubuntu:xenial":
(I have tried a "FROM centos:7" version as well - no difference.)
My environments are set up pinned to 0.9.2 with various workarounds applied as I encountered the need for them (ubuntu paths below):
pip docker==2.7.0 is workaround that I can't find a reference for now (?!?!) sed filters workaround addresses ansible-container bug described in https://github.com/moby/moby/issues/34121 sed return is workaround for https://github.com/ansible/ansible-container/issues/762
SUMMARY
Heads up: The observed behavior is strikingly similar to https://github.com/ansible/ansible-container/issues/673 but does not involve any cloud-enabled roles; all roles requested confirmed to exist on the filesystem in the single path specified in --roles-path option.
I have many services, each with many different roles listed. Previous to 2018-06-05 everything was working fine on a particular docker host. On 2018-06-05 I added an extra role to my services. at the end of the list (e.g. "BuildBox/Configuration4") which resulted in different failures depending on the environment.
In a direct-on-iron ansible-container virtualenv environment created after the problem date, an "ansible-container build" call completes fine.
Depending on the docker host I run an ansible-container docker image on, I get an error like:
The \<AC_ROLES_PATH> is the path provided in the ansible-container --roles-path option.
The missing \<NOTFOUNDROLE> role is, at times:
In all cases I can confirm all roles are present on the local / in-container filesystem before the ansible-container call.
The fact that on the working-before-2018-06-05 docker host, I can delete the recently-added last role and build successfully suggests that some caching is happening and maybe some intermediary tool changed (c.f. https://github.com/ansible/ansible-container/issues/673) but I am unable to determine what and where.
Failures not affected by presence/absense of --debug and/or --use-local-python
STEPS TO REPRODUCE
Create an on-iron virtualenv and set up environment as shown above Create a Dockerfile with ansible-container environment as shown above Set up the container.yml and various roles as described above Run:
EXPECTED RESULTS
working build, direct on-iron
ACTUAL RESULTS
debug output above, for ansible-container run in docker container on host, varies depending on host