since approx 2018-06-05, in-docker-container ansible-container build fails with "ansible.errors.AnsibleError: the role '<rolename>' was not found in <rolespath>" on different roles depending on environment #942

Open dchsueh opened 6 years ago

dchsueh commented 6 years ago

This is a reasonably small example I created to demonstrate the problem. (Yes it fails.)

version: '2'

  project_name: buildbox
    base: 'centos:7'

    from: centos:7
      - BuildBox/Base
      - BuildBox/Configuration1
      - BuildBox/Configuration2
      - BuildBox/Configuration3
      - BuildBox/Configuration4
    working_dir: /tmp
      - '22'
      - /usr/sbin/sshd
      - -D

Individual roles have a tasks/main.yml of the form

- command: echo BASE

substitute BASE for ONE, TWO, THREE, FOUR to match role


The environment for a virtualenv ansible-container install direct on ubuntu xenial:

Ansible Container, version 0.9.2
Linux, dhsueh-ubuntu, 4.13.0-43-generic, #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018, x86_64
2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] <virtualenv directory path>/bin/python2

Believed-identical environment configured as a Dockerfile-built docker container "FROM ubuntu:xenial":

Ansible Container, version 0.9.2
Linux, b92df59f4255, 4.13.0-43-generic, #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018, x86_64
2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] /usr/bin/python

(I have tried a "FROM centos:7" version as well - no difference.)

My environments are set up pinned to 0.9.2 with various workarounds applied as I encountered the need for them (ubuntu paths below):

pip --disable-pip-version-check install pip==9.0.3
pip --disable-pip-version-check install setuptools==39.2.0
pip --disable-pip-version-check install docker==2.7.0
pip --disable-pip-version-check install ansible-container[docker]==0.9.2
sed -i "s/filters={'name': self.secrets_volume_name}//g" /usr/local/lib/python2.7/dist-packages/container/docker/
sed -i "s/return os.path.join(os.sep, 'run', 'secrets')/return os.path.join(os.sep, 'docker', 'secrets')/g" /usr/local/lib/python2.7/dist-packages/container/docker/

pip docker==2.7.0 is workaround that I can't find a reference for now (?!?!) sed filters workaround addresses ansible-container bug described in sed return is workaround for


Heads up: The observed behavior is strikingly similar to but does not involve any cloud-enabled roles; all roles requested confirmed to exist on the filesystem in the single path specified in --roles-path option.

I have many services, each with many different roles listed. Previous to 2018-06-05 everything was working fine on a particular docker host. On 2018-06-05 I added an extra role to my services. at the end of the list (e.g. "BuildBox/Configuration4") which resulted in different failures depending on the environment.

In a direct-on-iron ansible-container virtualenv environment created after the problem date, an "ansible-container build" call completes fine.

Depending on the docker host I run an ansible-container docker image on, I get an error like:

2018-06-07T18:00:35.723801 Processing defaults section... [container.config] caller_file=/_ansible/container/ caller_func=_process_defaults caller_line=325
2018-06-07T18:00:35.726157 Processing section...          [container.config] caller_file=/_ansible/container/ caller_func=_process_top_level_sections caller_line=334 section=volumes
2018-06-07T18:00:35.728781 Processing section...          [container.config] caller_file=/_ansible/container/ caller_func=_process_top_level_sections caller_line=334 section=registries
2018-06-07T18:00:35.731282 Processing section...          [container.config] caller_file=/_ansible/container/ caller_func=_process_top_level_sections caller_line=334 section=secrets
2018-06-07T18:00:35.733772 Processing service...          [container.config] caller_file=/_ansible/container/ caller_func=_process_services caller_line=340 service=u'base' service_data={u'command': [u'/usr/sbin/sshd', u'-D'], u'working_dir': u'/tmp', u'from': u'centos:7', u'ports': [u'22'], u'roles': [u'BuildBox/Base', u'BuildBox/Configuration1', u'BuildBox/Configuration2', u'BuildBox/Configuration3', u'BuildBox/Configuration4']}
Traceback (most recent call last):
  File "/usr/bin/conductor", line 11, in <module>
    load_entry_point('ansible-container', 'console_scripts', 'conductor')()
  File "/_ansible/container/", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/", line 389, in conductor_commandline
    conductor_config = AnsibleContainerConductorConfig(list_to_ordereddict(containers_config))
  File "/_ansible/container/", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/", line 297, in __init__
  File "/_ansible/container/", line 357, in _process_services
    role_metadata = get_metadata_from_role(role_name)
  File "/_ansible/container/", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/utils/", line 275, in get_metadata_from_role
    return get_content_from_role(role_name, os.path.join('meta', 'container.yml'))
  File "/_ansible/container/", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/utils/", line 264, in get_content_from_role
    role_path = resolve_role_to_path(role_name)
  File "/_ansible/container/", line 19, in __wrapped__
    return fn(*args, **kwargs)
  File "/_ansible/container/utils/", line 210, in resolve_role_to_path
  File "/usr/lib/python2.7/site-packages/ansible/playbook/role/", line 59, in load
    return ri.load_data(data, variable_manager=variable_manager, loader=loader)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/", line 244, in load_data
    ds = self.preprocess_data(ds)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/role/", line 94, in preprocess_data
    (role_name, role_path) = self._load_role_path(role_name)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/role/", line 187, in _load_role_path
    raise AnsibleError("the role '%s' was not found in %s" % (role_name, ":".join(role_search_paths)), obj=self._ds)
ansible.errors.AnsibleError: the role '<NOTFOUNDROLE>' was not found in ./roles:<AC_ROLES_PATH>:/src/roles:/etc/ansible/roles:.

The \<AC_ROLES_PATH> is the path provided in the ansible-container --roles-path option.

The missing \<NOTFOUNDROLE> role is, at times:

In all cases I can confirm all roles are present on the local / in-container filesystem before the ansible-container call.

The fact that on the working-before-2018-06-05 docker host, I can delete the recently-added last role and build successfully suggests that some caching is happening and maybe some intermediary tool changed (c.f. but I am unable to determine what and where.

Failures not affected by presence/absense of --debug and/or --use-local-python


Create an on-iron virtualenv and set up environment as shown above Create a Dockerfile with ansible-container environment as shown above Set up the container.yml and various roles as described above Run:

ansible-container build --services base --roles-path <wherever you put the roles>

working build, direct on-iron


debug output above, for ansible-container run in docker container on host, varies depending on host

Voronenko commented 6 years ago

I have taken a look on your issue , using POC repo from other issue as a base:

extending it to be double role:

    from: ubuntu:14.04
      - role: role-2
        gather_facts: no
      - role: my-new-role
        gather_facts: no

on a clear virtual env (py2 , base os ubuntu 16.04 LTS), without mentioned Dockerfile - works like a charm for me. This makes me think that issue might be not in ansible-container, but your environment (i.e. combination of dockerized ansible-container + conductor + container)

Perhaps you can create POC repository for issue, using above as basis ?

as an option - try to build with --no-container-cache i.e.

ansible-container build --no-container-cache --services base --roles-path <wherever you put the roles>

If it get's better , please comment here

dchsueh commented 6 years ago

Voronenko, I do appreciate you looking at this. (At this time it seems the support that ansible-container users might get post- is ourselves!)

My original writeup is reeeeely long and the working/not-working scenarios are buried in too much other text:

I agree that the main factor is the dockerized ansible-container setup. The thing that strikes me as very strange is that the dockerized configuration was working fine for the month or two that I was using it successfully before approx June 5. And on the server that was working previously, the roles that now cannot be found are the roles I added after June 5; all the previously found and previously working roles still work.

Would you mind trying running an ansible-container build in a container? Here's a minimal ubuntu:xenial Dockerfile that should run ansible-container successfully (mount in /var/run/docker.sock and your ansible code):

FROM ubuntu:xenial

WORKDIR /var/tmp

RUN apt-get -y update \
  && apt-get -y install curl python less

RUN curl -o \
  && python \
  && pip --disable-pip-version-check install pip==9.0.3 \
  && pip --disable-pip-version-check install setuptools==39.2.0 \
  && pip --disable-pip-version-check install docker==2.7.0 \
  && pip --disable-pip-version-check install ansible-container[docker]==0.9.2 \
  && sed -i "s/filters={'name': self.secrets_volume_name}//g" /usr/local/lib/python2.7/dist-packages/container/docker/ \
  && sed -i "s/return os.path.join(os.sep, 'run', 'secrets')/return os.path.join(os.sep, 'docker', 'secrets')/g" /usr/local/lib/python2.7/dist-packages/container/docker/ \
  && true
# sed filters addresses ansible-container bug described in
# sed return is workaround for

RUN curl | tar -zxC /usr/local/bin/ --strip-components=1 docker/docker

pip freeze output in both in-virtualenv working and global-env nonworking ubuntu situations is:

$ pip freeze

-- edit: changed dockerfile from centos:7 to ubuntu:xenial

Voronenko commented 6 years ago

From one hand I confirm the issue (i.e. in some circumstances role not found, if mapped to other path than on original host), from other hand whole approach is erroneous:

1) You bind docker sock from (unknown) docker version - i.e. only you know it 2) From other hand, you install very specific (and potentially incompatible with that sock) version of the docker inside container RUN curl

i.e. summary at that point - I would not do in that way.... and instead go with local python with ansible-container in virtual env

3) build for sure happens on the target host , i.e. if you map your working folder into exactly same location, i.e. kind of -v /home/slavko/tmp/ansible-test:/home/slavko/tmp/ansible-test \ and not -v /home/slavko/tmp/ansible-test:/app \

docker process starts to find mentioned roles and even tries to build.

I would not do building docker from docker with mapped sock. Using TCP port ? who knows - seems more reliable, at least it will send context there.

Hope that helps

dchsueh commented 6 years ago

your suggestions and analysis give me some good ideas on investigating a workaround or alternate approaches

I'll report back if anything ends up successful

(the idea of curl-ing the docker binary directly into the image comes from how the conductor images are created - "docker history --no-trunc ansible/container-conductor-centos-7:0.9.2")

thank you

Voronenko commented 6 years ago

Your comment about conductor is right. So this is rather api compability.