ansible / ansible-container

DEPRECATED -- Ansible Container was a tool to build Docker images and orchestrate containers using only Ansible playbooks.
GNU Lesser General Public License v3.0
2.19k stars 394 forks source link

since approx 2018-06-05, in-docker-container ansible-container build fails with "ansible.errors.AnsibleError: the role '<rolename>' was not found in <rolespath>" on different roles depending on environment #942

Open dchsueh opened 6 years ago

dchsueh commented 6 years ago
ISSUE TYPE
container.yml

This is a reasonably small example I created to demonstrate the problem. (Yes it fails.)

version: '2'

settings:
  project_name: buildbox
  conductor:
    base: 'centos:7'

services:
  base:
    from: centos:7
    roles:
      - BuildBox/Base
      - BuildBox/Configuration1
      - BuildBox/Configuration2
      - BuildBox/Configuration3
      - BuildBox/Configuration4
    working_dir: /tmp
    ports:
      - '22'
    command:
      - /usr/sbin/sshd
      - -D

Individual roles have a tasks/main.yml of the form

---
- command: echo BASE

substitute BASE for ONE, TWO, THREE, FOUR to match role

OS / ENVIRONMENT

The environment for a virtualenv ansible-container install direct on ubuntu xenial:

Ansible Container, version 0.9.2
Linux, dhsueh-ubuntu, 4.13.0-43-generic, #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018, x86_64
2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] <virtualenv directory path>/bin/python2

Believed-identical environment configured as a Dockerfile-built docker container "FROM ubuntu:xenial":

Ansible Container, version 0.9.2
Linux, b92df59f4255, 4.13.0-43-generic, #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018, x86_64
2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] /usr/bin/python

(I have tried a "FROM centos:7" version as well - no difference.)

My environments are set up pinned to 0.9.2 with various workarounds applied as I encountered the need for them (ubuntu paths below):

pip --disable-pip-version-check install pip==9.0.3
pip --disable-pip-version-check install setuptools==39.2.0
pip --disable-pip-version-check install docker==2.7.0
pip --disable-pip-version-check install ansible-container[docker]==0.9.2
sed -i "s/filters={'name': self.secrets_volume_name}//g" /usr/local/lib/python2.7/dist-packages/container/docker/secrets.py
sed -i "s/return os.path.join(os.sep, 'run', 'secrets')/return os.path.join(os.sep, 'docker', 'secrets')/g" /usr/local/lib/python2.7/dist-packages/container/docker/engine.py

pip docker==2.7.0 is workaround that I can't find a reference for now (?!?!) sed filters workaround addresses ansible-container bug described in https://github.com/moby/moby/issues/34121 sed return is workaround for https://github.com/ansible/ansible-container/issues/762

SUMMARY

Heads up: The observed behavior is strikingly similar to https://github.com/ansible/ansible-container/issues/673 but does not involve any cloud-enabled roles; all roles requested confirmed to exist on the filesystem in the single path specified in --roles-path option.

I have many services, each with many different roles listed. Previous to 2018-06-05 everything was working fine on a particular docker host. On 2018-06-05 I added an extra role to my services. at the end of the list (e.g. "BuildBox/Configuration4") which resulted in different failures depending on the environment.

In a direct-on-iron ansible-container virtualenv environment created after the problem date, an "ansible-container build" call completes fine.

Depending on the docker host I run an ansible-container docker image on, I get an error like:

2018-06-07T18:00:35.723801 Processing defaults section... [container.config] caller_file=/_ansible/container/config.py caller_func=_process_defaults caller_line=325
2018-06-07T18:00:35.726157 Processing section...          [container.config] caller_file=/_ansible/container/config.py caller_func=_process_top_level_sections caller_line=334 section=volumes
2018-06-07T18:00:35.728781 Processing section...          [container.config] caller_file=/_ansible/container/config.py caller_func=_process_top_level_sections caller_line=334 section=registries
2018-06-07T18:00:35.731282 Processing section...          [container.config] caller_file=/_ansible/container/config.py caller_func=_process_top_level_sections caller_line=334 section=secrets
2018-06-07T18:00:35.733772 Processing service...          [container.config] caller_file=/_ansible/container/config.py caller_func=_process_services caller_line=340 service=u'base' service_data={u'command': [u'/usr/sbin/sshd', u'-D'], u'working_dir': u'/tmp', u'from': u'centos:7', u'ports': [u'22'], u'roles': [u'BuildBox/Base', u'BuildBox/Configuration1', u'BuildBox/Configuration2', u'BuildBox/Configuration3', u'BuildBox/Configuration4']}
Traceback (most recent call last):
  File "/usr/bin/conductor", line 11, in <module>
    load_entry_point('ansible-container', 'console_scripts', 'conductor')()
  File "/_ansible/container/__init__.py", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/cli.py", line 389, in conductor_commandline
    conductor_config = AnsibleContainerConductorConfig(list_to_ordereddict(containers_config))
  File "/_ansible/container/__init__.py", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/config.py", line 297, in __init__
    self._process_services()
  File "/_ansible/container/config.py", line 357, in _process_services
    role_metadata = get_metadata_from_role(role_name)
  File "/_ansible/container/__init__.py", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/utils/__init__.py", line 275, in get_metadata_from_role
    return get_content_from_role(role_name, os.path.join('meta', 'container.yml'))
  File "/_ansible/container/__init__.py", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/utils/__init__.py", line 264, in get_content_from_role
    role_path = resolve_role_to_path(role_name)
  File "/_ansible/container/__init__.py", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/utils/__init__.py", line 210, in resolve_role_to_path
    loader=loader)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/role/include.py", line 59, in load
    return ri.load_data(data, variable_manager=variable_manager, loader=loader)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/base.py", line 244, in load_data
    ds = self.preprocess_data(ds)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/role/definition.py", line 94, in preprocess_data
    (role_name, role_path) = self._load_role_path(role_name)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/role/definition.py", line 187, in _load_role_path
    raise AnsibleError("the role '%s' was not found in %s" % (role_name, ":".join(role_search_paths)), obj=self._ds)
ansible.errors.AnsibleError: the role '<NOTFOUNDROLE>' was not found in ./roles:<AC_ROLES_PATH>:/src/roles:/etc/ansible/roles:.

The \<AC_ROLES_PATH> is the path provided in the ansible-container --roles-path option.

The missing \<NOTFOUNDROLE> role is, at times:

In all cases I can confirm all roles are present on the local / in-container filesystem before the ansible-container call.

The fact that on the working-before-2018-06-05 docker host, I can delete the recently-added last role and build successfully suggests that some caching is happening and maybe some intermediary tool changed (c.f. https://github.com/ansible/ansible-container/issues/673) but I am unable to determine what and where.

Failures not affected by presence/absense of --debug and/or --use-local-python

STEPS TO REPRODUCE

Create an on-iron virtualenv and set up environment as shown above Create a Dockerfile with ansible-container environment as shown above Set up the container.yml and various roles as described above Run:

ansible-container build --services base --roles-path <wherever you put the roles>
EXPECTED RESULTS

working build, direct on-iron

ACTUAL RESULTS

debug output above, for ansible-container run in docker container on host, varies depending on host

Voronenko commented 6 years ago

I have taken a look on your issue , using POC repo from other issue as a base:

https://github.com/Nexlo/ansible-test

extending it to be double role:

services:
  web:
    from: ubuntu:14.04
    roles:
      - role: role-2
        gather_facts: no
      - role: my-new-role
        gather_facts: no

on a clear virtual env (py2 , base os ubuntu 16.04 LTS), without mentioned Dockerfile - works like a charm for me. This makes me think that issue might be not in ansible-container, but your environment (i.e. combination of dockerized ansible-container + conductor + container)

Perhaps you can create POC repository for issue, using above https://github.com/Nexlo/ansible-test as basis ?

as an option - try to build with --no-container-cache i.e.

ansible-container build --no-container-cache --services base --roles-path <wherever you put the roles>

If it get's better , please comment here

dchsueh commented 6 years ago

Voronenko, I do appreciate you looking at this. (At this time it seems the support that ansible-container users might get post-https://github.com/ansible/ansible-container/commit/2fa778a7c8d1699672314ac0b89c53554f435cb7 is ourselves!)

My original writeup is reeeeely long and the working/not-working scenarios are buried in too much other text:

I agree that the main factor is the dockerized ansible-container setup. The thing that strikes me as very strange is that the dockerized configuration was working fine for the month or two that I was using it successfully before approx June 5. And on the server that was working previously, the roles that now cannot be found are the roles I added after June 5; all the previously found and previously working roles still work.

Would you mind trying running an ansible-container build in a container? Here's a minimal ubuntu:xenial Dockerfile that should run ansible-container successfully (mount in /var/run/docker.sock and your ansible code):

FROM ubuntu:xenial

WORKDIR /var/tmp

RUN apt-get -y update \
  && apt-get -y install curl python less

RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py \
  && python get-pip.py \
  && pip --disable-pip-version-check install pip==9.0.3 \
  && pip --disable-pip-version-check install setuptools==39.2.0 \
  && pip --disable-pip-version-check install docker==2.7.0 \
  && pip --disable-pip-version-check install ansible-container[docker]==0.9.2 \
  && sed -i "s/filters={'name': self.secrets_volume_name}//g" /usr/local/lib/python2.7/dist-packages/container/docker/secrets.py \
  && sed -i "s/return os.path.join(os.sep, 'run', 'secrets')/return os.path.join(os.sep, 'docker', 'secrets')/g" /usr/local/lib/python2.7/dist-packages/container/docker/engine.py \
  && true
# sed filters addresses ansible-container bug described in https://github.com/moby/moby/issues/34121
# sed return is workaround for https://github.com/ansible/ansible-container/issues/762

RUN curl https://get.docker.com/builds/Linux/x86_64/docker-17.04.0-ce.tgz | tar -zxC /usr/local/bin/ --strip-components=1 docker/docker

pip freeze output in both in-virtualenv working and global-env nonworking ubuntu situations is:

$ pip freeze
ansible-container==0.9.2
backports.ssl-match-hostname==3.5.0.1
certifi==2018.4.16
chardet==3.0.4
colorama==0.3.9
docker==2.7.0
docker-pycreds==0.3.0
idna==2.7
ipaddress==1.0.22
Jinja2==2.10
MarkupSafe==1.0
PyYAML==3.12
requests==2.19.0
ruamel.ordereddict==0.4.13
ruamel.yaml==0.15.38
six==1.11.0
structlog==18.1.0
urllib3==1.23
websocket-client==0.48.0

-- edit: changed dockerfile from centos:7 to ubuntu:xenial

Voronenko commented 6 years ago

From one hand I confirm the issue (i.e. in some circumstances role not found, if mapped to other path than on original host), from other hand whole approach is erroneous:

1) You bind docker sock from (unknown) docker version - i.e. only you know it 2) From other hand, you install very specific (and potentially incompatible with that sock) version of the docker inside container RUN curl https://get.docker.com/builds/Linux/x86_64/docker-17.04.0-ce.tgz

i.e. summary at that point - I would not do in that way.... and instead go with local python with ansible-container in virtual env

3) build for sure happens on the target host , i.e. if you map your working folder into exactly same location, i.e. kind of -v /home/slavko/tmp/ansible-test:/home/slavko/tmp/ansible-test \ and not -v /home/slavko/tmp/ansible-test:/app \

docker process starts to find mentioned roles and even tries to build.

I would not do building docker from docker with mapped sock. Using TCP port ? who knows - seems more reliable, at least it will send context there.

Hope that helps

dchsueh commented 6 years ago

your suggestions and analysis give me some good ideas on investigating a workaround or alternate approaches

I'll report back if anything ends up successful

(the idea of curl-ing the docker binary directly into the image comes from how the conductor images are created - "docker history --no-trunc ansible/container-conductor-centos-7:0.9.2")

thank you

Voronenko commented 6 years ago

Your comment about conductor is right. So this is rather api compability.