ansible / ansible-container

DEPRECATED -- Ansible Container was a tool to build Docker images and orchestrate containers using only Ansible playbooks.
GNU Lesser General Public License v3.0
2.19k stars 392 forks source link

Services cannot connect to each other during build #598

Open hackermd opened 7 years ago

hackermd commented 7 years ago
ISSUE TYPE

Bug Report

container.yml

https://github.com/TissueMAPS/TmDeploy/blob/fb0526c348708099bb63b6fb432ada47bfc4e86b/tmdeploy/share/container/projects/tissuemaps/container.yml

OS / ENVIRONMENT
Ansible Container, version 0.9.1
Darwin, MacBook-Pro-4.local, 14.5.0, Darwin Kernel Version 14.5.0: Tue Apr 11 16:12:42 PDT 2017; root:xnu-2782.50.9.2.3~1/RELEASE_X86_64, x86_64
2.7.11 (default, Jan 22 2016, 08:28:37)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] /Users/mdh/Envs/deploy/bin/python2.7
{
  "ContainersPaused": 0,
  "Labels": null,
  "CgroupDriver": "cgroupfs",
  "ContainersRunning": 1,
  "ContainerdCommit": {
    "Expected": "4ab9917febca54791c5f071a9d1f404867857fcc",
    "ID": "4ab9917febca54791c5f071a9d1f404867857fcc"
  },
  "InitBinary": "docker-init",
  "NGoroutines": 43,
  "Swarm": {
    "Managers": 0,
    "ControlAvailable": false,
    "NodeID": "",
    "Cluster": {
      "Spec": {
        "TaskDefaults": {},
        "Orchestration": {},
        "EncryptionConfig": {
          "AutoLockManagers": false
        },
        "Raft": {
          "HeartbeatTick": 0,
          "ElectionTick": 0
        },
        "CAConfig": {},
        "Dispatcher": {}
      },
      "Version": {},
      "ID": "",
      "CreatedAt": "0001-01-01T00:00:00Z",
      "UpdatedAt": "0001-01-01T00:00:00Z"
    },
    "Nodes": 0,
    "Error": "",
    "RemoteManagers": null,
    "LocalNodeState": "inactive",
    "NodeAddr": ""
  },
  "LoggingDriver": "json-file",
  "OSType": "linux",
  "HttpProxy": "",
  "Runtimes": {
    "runc": {
      "path": "docker-runc"
    }
  },
  "DriverStatus": [
    [
      "Backing Filesystem",
      "extfs"
    ],
    [
      "Supports d_type",
      "true"
    ],
    [
      "Native Overlay Diff",
      "true"
    ]
  ],
  "OperatingSystem": "Alpine Linux v3.5",
  "Containers": 1,
  "HttpsProxy": "",
  "BridgeNfIp6tables": true,
  "MemTotal": 6246776832,
  "SecurityOptions": [
    "name=seccomp,profile=default"
  ],
  "Driver": "overlay2",
  "IndexServerAddress": "https://index.docker.io/v1/",
  "ClusterStore": "",
  "InitCommit": {
    "Expected": "949e6fa",
    "ID": "949e6fa"
  },
  "Isolation": "",
  "SystemStatus": null,
  "OomKillDisable": true,
  "ClusterAdvertise": "",
  "SystemTime": "2017-06-10T14:14:07.328417937Z",
  "Name": "moby",
  "CPUSet": true,
  "RegistryConfig": {
    "InsecureRegistryCIDRs": [
      "127.0.0.0/8"
    ],
    "IndexConfigs": {
      "docker.io": {
        "Official": true,
        "Name": "docker.io",
        "Secure": true,
        "Mirrors": null
      }
    },
    "Mirrors": []
  },
  "DefaultRuntime": "runc",
  "ContainersStopped": 0,
  "NCPU": 4,
  "NFd": 26,
  "Architecture": "x86_64",
  "KernelMemory": true,
  "CpuCfsQuota": true,
  "Debug": true,
  "ID": "KC7E:MCGC:TNHP:LBMN:ZPX7:F6AB:OIVT:7P4L:O3QX:5UOJ:C6MQ:YT45",
  "IPv4Forwarding": true,
  "KernelVersion": "4.9.27-moby",
  "BridgeNfIptables": true,
  "NoProxy": "*.local, 169.254/16",
  "LiveRestoreEnabled": false,
  "ServerVersion": "17.03.1-ce",
  "CpuCfsPeriod": true,
  "ExperimentalBuild": false,
  "MemoryLimit": true,
  "SwapLimit": true,
  "Plugins": {
    "Volume": [
      "local"
    ],
    "Network": [
      "bridge",
      "host",
      "macvlan",
      "null",
      "overlay"
    ],
    "Authorization": null
  },
  "Images": 6,
  "DockerRootDir": "/var/lib/docker",
  "NEventsListener": 1,
  "CPUShares": true,
  "RuncCommit": {
    "Expected": "54296cf40ad8143b62dbcaa1d90e520a2136ddfe",
    "ID": "54296cf40ad8143b62dbcaa1d90e520a2136ddfe"
  }
}
{
  "KernelVersion": "4.9.27-moby",
  "Arch": "amd64",
  "BuildTime": "2017-03-24T00:00:50.070226199+00:00",
  "ApiVersion": "1.27",
  "Version": "17.03.1-ce",
  "MinAPIVersion": "1.12",
  "GitCommit": "c6d412e",
  "Os": "linux",
  "GoVersion": "go1.7.5"
}
SUMMARY

In contrast to earlier versions of ansible-container, each container image is build separately instead of all being build collectively at the end of the build process. As a consequence, a container cannot connect to another container during build (even if a dependency is declared in container.yml). In addition, the host name of another container cannot be dynamically looked up, because the container is no longer running and the information therefore not available on the network.

STEPS TO REPRODUCE
git clone https://github.com/tissuemaps/tmdeploy ~/tmdeploy
cd ~/tmdeploy
git checkout refactor/ansible-container-0.9
mkvirtualenv tissuemaps   # optional: requires virtualenvwrapper
pip install .
tm_deploy -vvvv container deploy --from-scratch
EXPECTED RESULTS

Expected that the master_add_node() function would execute successfully in this task given the db_master host defined in container.yml.

ACTUAL RESULTS

The app container cannot connect to the db_master container, because the db_master container is no longer running.

TASK [database-client : Add remote worker nodes] *******************************
task path: /Users/mdh/tmdeploy/tmdeploy/share/playbooks/tissuemaps/roles/database-client/tasks/add_worker.yml:18
Using module file /usr/local/lib/python2.7/dist-packages/ansible/modules/commands/command.py
<1519d08736dd3366fa1a1964706f0f3c8c3d8f553dc64605dea720224a511410> ESTABLISH DOCKER CONNECTION FOR USER: root
<1519d08736dd3366fa1a1964706f0f3c8c3d8f553dc64605dea720224a511410> EXEC ['/usr/local/bin/docker', 'exec', '-i', u'1519d08736dd3366fa1a1964706f0f3c8c3d8f553dc64605dea720224a511410', u'/bin/sh', '-c', u"/bin/sh -c 'echo ~ && sleep 0'"]
<1519d08736dd3366fa1a1964706f0f3c8c3d8f553dc64605dea720224a511410> EXEC ['/usr/local/bin/docker', 'exec', '-i', u'1519d08736dd3366fa1a1964706f0f3c8c3d8f553dc64605dea720224a511410', u'/bin/sh', '-c', u'/bin/sh -c \'( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1497111472.58-70113040286471 `" && echo ansible-tmp-1497111472.58-70113040286471="` echo /root/.ansible/tmp/ansible-tmp-1497111472.58-70113040286471 `" ) && sleep 0\'']
<1519d08736dd3366fa1a1964706f0f3c8c3d8f553dc64605dea720224a511410> PUT /tmp/tmphmMQhk TO /root/.ansible/tmp/ansible-tmp-1497111472.58-70113040286471/command.py
<1519d08736dd3366fa1a1964706f0f3c8c3d8f553dc64605dea720224a511410> EXEC ['/usr/local/bin/docker', 'exec', '-i', u'1519d08736dd3366fa1a1964706f0f3c8c3d8f553dc64605dea720224a511410', u'/bin/sh', '-c', u"/bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1497111472.58-70113040286471/ /root/.ansible/tmp/ansible-tmp-1497111472.58-70113040286471/command.py && sleep 0'"]
<1519d08736dd3366fa1a1964706f0f3c8c3d8f553dc64605dea720224a511410> EXEC ['/usr/local/bin/docker', 'exec', '-i', u'1519d08736dd3366fa1a1964706f0f3c8c3d8f553dc64605dea720224a511410', u'/bin/sh', '-c', u'/bin/sh -c \'/_usr/bin/python /root/.ansible/tmp/ansible-tmp-1497111472.58-70113040286471/command.py; rm -rf "/root/.ansible/tmp/ansible-tmp-1497111472.58-70113040286471/" > /dev/null 2>&1 && sleep 0\'']
failed: [app] (item=db_worker) => {
    "changed": true,
    "cmd": [
        "psql",
        "-h",
        "db_master",
        "-p",
        "5432",
        "-U",
        "postgres",
        "tissuemaps",
        "-c",
        "SELECT master_add_node('db_worker', 9700);"
    ],
    "delta": "0:00:00.108804",
    "end": "2017-06-10 16:17:52.978140",
    "failed": true,
    "invocation": {
        "module_args": {
            "_raw_params": "psql -h db_master -p 5432 -U postgres tissuemaps -c \"SELECT master_add_node('db_worker',9700);\"",
            "_uses_shell": false,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "warn": true
        }
    },
    "item": "db_worker",
    "rc": 2,
    "start": "2017-06-10 16:17:52.869336",
    "stderr": "psql: could not translate host name \"db_master\" to address: Name or service not known",
    "stderr_lines": [
        "psql: could not translate host name \"db_master\" to address: Name or service not known"
    ],
    "stdout": "",
    "stdout_lines": []
}
        to retry, use: --limit @/tmp/tmp6dtUkl/playbook.retry
chouseknecht commented 7 years ago

This is an intentional change introduced in 0.9. With this new approach we can now maintain a reasonable build cache similar to Docker's. The problem with the old approach is that we started up all the services, and rebuilt them all, essentially from scratch, with each run of build. It performed poorly, to say the least. With this new approach, we can skip running roles when there is no change detected in the role, which gives us much better performance on repeat builds.

Connections between containers need to be handled at application runtime, which is the same philosophy followed by K8s and OpenShift. It takes some thought, and deliberate programming to get it right, but it's entirely possible to have one service wait on another service to be available before taking action, and that's the path applications should follow.