ansible / ansible-container

DEPRECATED -- Ansible Container was a tool to build Docker images and orchestrate containers using only Ansible playbooks.
GNU Lesser General Public License v3.0
2.19k stars 392 forks source link

Segmentation fault installing apt packages #975

Closed elihunter173 closed 5 years ago

elihunter173 commented 5 years ago
ISSUE TYPE
container.yml
version: "2"

settings:
  conductor:
    base: ubuntu:xenial
    environment:
      - COMPOSE_HTTP_TIMEOUT=300
  project_name: arc-dev

services:
  adlc:
    from: ubuntu:xenial
    user: arc
    vars_files:
      - container_vars.yml
    roles:
      - container_bootstrap
      - common
      - imagery_system
      - adlc

registries:
  gitlab:
    url: registry.gitlab.com/ncsuarc/core-provisioning
    namespace: arc
OS / ENVIRONMENT
Ansible Container, version 0.9.2
Linux, augustus, 4.20.12-arch1-1-ARCH, #1 SMP PREEMPT Sat Feb 23 15:11:34 UTC 2019, x86_64
3.7.2 (default, Jan 10 2019, 23:51:51) 
[GCC 8.2.1 20181127] /home/eli/src/arc/provisioning/virtualenv/bin/python
{
  "ID": "SXVQ:Q4RE:IE6G:VFG4:IEYS:HH7X:YZC3:Z34V:GPLR:UKSW:Q42P:E7PQ",
  "Containers": 5,
  "ContainersRunning": 1,
  "ContainersPaused": 0,
  "ContainersStopped": 4,
  "Images": 9,
  "Driver": "overlay2",
  "DriverStatus": [
    [
      "Backing Filesystem",
      "extfs"
    ],
    [
      "Supports d_type",
      "true"
    ],
    [
      "Native Overlay Diff",
      "false"
    ]
  ],
  "SystemStatus": null,
  "Plugins": {
    "Volume": [
      "local"
    ],
    "Network": [
      "bridge",
      "host",
      "macvlan",
      "null",
      "overlay"
    ],
    "Authorization": null,
    "Log": [
      "awslogs",
      "fluentd",
      "gcplogs",
      "gelf",
      "journald",
      "json-file",
      "local",
      "logentries",
      "splunk",
      "syslog"
    ]
  },
  "MemoryLimit": true,
  "SwapLimit": true,
  "KernelMemory": true,
  "CpuCfsPeriod": true,
  "CpuCfsQuota": true,
  "CPUShares": true,
  "CPUSet": true,
  "IPv4Forwarding": true,
  "BridgeNfIptables": true,
  "BridgeNfIp6tables": true,
  "Debug": false,
  "NFd": 29,
  "OomKillDisable": true,
  "NGoroutines": 48,
  "SystemTime": "2019-02-28T16:29:53.850975417-05:00",
  "LoggingDriver": "json-file",
  "CgroupDriver": "cgroupfs",
  "NEventsListener": 0,
  "KernelVersion": "4.20.12-arch1-1-ARCH",
  "OperatingSystem": "Arch Linux",
  "OSType": "linux",
  "Architecture": "x86_64",
  "IndexServerAddress": "https://index.docker.io/v1/",
  "RegistryConfig": {
    "AllowNondistributableArtifactsCIDRs": [],
    "AllowNondistributableArtifactsHostnames": [],
    "InsecureRegistryCIDRs": [
      "127.0.0.0/8"
    ],
    "IndexConfigs": {
      "docker.io": {
        "Name": "docker.io",
        "Mirrors": [],
        "Secure": true,
        "Official": true
      }
    },
    "Mirrors": []
  },
  "NCPU": 8,
  "MemTotal": 16417976320,
  "GenericResources": null,
  "DockerRootDir": "/var/lib/docker",
  "HttpProxy": "",
  "HttpsProxy": "",
  "NoProxy": "",
  "Name": "augustus",
  "Labels": [],
  "ExperimentalBuild": false,
  "ServerVersion": "18.09.2-ce",
  "ClusterStore": "",
  "ClusterAdvertise": "",
  "Runtimes": {
    "runc": {
      "path": "runc"
    }
  },
  "DefaultRuntime": "runc",
  "Swarm": {
    "NodeID": "",
    "NodeAddr": "",
    "LocalNodeState": "inactive",
    "ControlAvailable": false,
    "Error": "",
    "RemoteManagers": null
  },
  "LiveRestoreEnabled": false,
  "Isolation": "",
  "InitBinary": "docker-init",
  "ContainerdCommit": {
    "ID": "9f2e07b1fc1342d1c48fe4d7bbb94cb6d1bf278b.m",
    "Expected": "9f2e07b1fc1342d1c48fe4d7bbb94cb6d1bf278b.m"
  },
  "RuncCommit": {
    "ID": "ccb5efd37fb7c86364786e9137e22948751de7ed-dirty",
    "Expected": "ccb5efd37fb7c86364786e9137e22948751de7ed-dirty"
  },
  "InitCommit": {
    "ID": "fec3683",
    "Expected": "fec3683"
  },
  "SecurityOptions": [
    "name=seccomp,profile=default"
  ],
  "Warnings": null
}
{
  "Platform": {
    "Name": ""
  },
  "Components": [
    {
      "Name": "Engine",
      "Version": "18.09.2-ce",
      "Details": {
        "ApiVersion": "1.39",
        "Arch": "amd64",
        "BuildTime": "2019-02-11T23:55:58.000000000+00:00",
        "Experimental": "false",
        "GitCommit": "62479626f2",
        "GoVersion": "go1.11.5",
        "KernelVersion": "4.20.12-arch1-1-ARCH",
        "MinAPIVersion": "1.12",
        "Os": "linux"
      }
    }
  ],
  "Version": "18.09.2-ce",
  "ApiVersion": "1.39",
  "MinAPIVersion": "1.12",
  "GitCommit": "62479626f2",
  "GoVersion": "go1.11.5",
  "Os": "linux",
  "Arch": "amd64",
  "KernelVersion": "4.20.12-arch1-1-ARCH",
  "BuildTime": "2019-02-11T23:55:58.000000000+00:00"
}
SUMMARY
STEPS TO REPRODUCE

Run the given container.yml using the below command (--debug is optional) with the following task in common.

Command:
ansible-container --debug --vars-files container_vars.yml build
Task:
- name: Install standard build tools
  become: yes
  apt:
    name: "{{ packages }}"
    state: latest
    update_cache: yes
    cache_valid_time: 3600
  vars:
    packages:
    - build-essential
    - automake
    - libtool
    - m4
    - cmake
    - pkg-config
EXPECTED RESULTS

The given apt packages install and update appropriately. This occurred properly until yesterday despite no changes being done to the task and it still working with ansible-playbook.

ACTUAL RESULTS

Upon reaching the task [common : Install standard build tools], the program quickly crashes with a segmentation fault.

TASK [common : Install standard build tools] ***********************************
task path: /src/roles/common/tasks/main.yml:66
Using module file /usr/local/lib/python2.7/dist-packages/ansible/modules/packaging/os/apt.py
<35439c3ce68754204da009bf856e8fd791d4a1e3079d80c25cc4f021ffa4a874> ESTABLISH DOCKER CONNECTION FOR USER: root
<35439c3ce68754204da009bf856e8fd791d4a1e3079d80c25cc4f021ffa4a874> EXEC ['/usr/local/bin/docker', 'exec', '-i', u'35439c3ce68754204da009bf856e8fd791d4a1e3079d80c25cc4f021ffa4a874', u'/bin/sh', '-c', u"/bin/sh -c 'echo ~ && sleep 0'"]
<35439c3ce68754204da009bf856e8fd791d4a1e3079d80c25cc4f021ffa4a874> EXEC ['/usr/local/bin/docker', 'exec', '-i', u'35439c3ce68754204da009bf856e8fd791d4a1e3079d80c25cc4f021ffa4a874', u'/bin/sh', '-c', u'/bin/sh -c \'( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1551389914.05-37269423617846 `" && echo ansible-tmp-1551389914.05-37269423617846="` echo /root/.ansible/tmp/ansible-tmp-1551389914.05-37269423617846 `" ) && sleep 0\'']
<35439c3ce68754204da009bf856e8fd791d4a1e3079d80c25cc4f021ffa4a874> PUT /tmp/tmpe4dWqX TO /root/.ansible/tmp/ansible-tmp-1551389914.05-37269423617846/apt.py
fatal: [adlc]: FAILED! => {
    "failed": true,
    "msg": "failed to transfer file /tmp/tmpe4dWqX to /root/.ansible/tmp/ansible-tmp-1551389914.05-37269423617846/apt.py:\n\nSegmentation fault (core dumped)\n"
}
Voronenko commented 5 years ago

until yesterday seems like base image drift.

2) Why would you need become:yes in container ?

elihunter173 commented 5 years ago

That was my guess, but I haven't been able to figure out a fix to it.

I don't need it in the container, but that role manages the installation of common software dependencies on both containers and actual machines. Since it's necessary on the real machines and doesn't cause any issues (that I know of) on the container, it's included to allow reuse.

l4r1k4 commented 5 years ago

It's a xenial docker image problem. Depend on the package you install he asks to upgrade libc and crashes. Same container and role work fine with bionic.

l4r1k4 commented 5 years ago

until yesterday seems like base image drift.

  1. Why would you need become:yes in container ?

I need "become: yes" when we use images that starts as sudoer and not as root.

janwittmer commented 5 years ago

fatal: [adlc]: FAILED! => { "failed": true, "msg": "failed to transfer file /tmp/tmpe4dWqX to /root/.ansible/tmp/ansible-tmp-1551389914.05-37269423617846/apt.py:\n\nSegmentation fault (core dumped)\n" }

We had the same problem and our workarround is to use a pinned version of the base image. The latest version (ubuntu:xenial-20190222) produces the segmentation fault, but the version before (xenial-20190122) works as expected.

So in your case use:

services:
  adlc:
    from: ubuntu:xenial-20190122
elihunter173 commented 5 years ago

@matteotanca Switching to bionic complains about permission failure. I'm not sure why this is the case or how to easily fix that.

TASK [Gathering Facts] *********************************************************
fatal: [adlc]: UNREACHABLE! => {"changed": false, "msg": "Authentication or permission failure. In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote temp path in ansible.cfg to a path rooted in \"/tmp\". Failed command was: ( umask 77 && mkdir -p \"` echo ~/.ansible/tmp/ansible-tmp-1552748664.15-17311126964067 `\" && echo ansible-tmp-1552748664.15-17311126964067=\"` echo ~/.ansible/tmp/ansible-tmp-1552748664.15-17311126964067 `\" ), exited with result 1", "unreachable": true}

@janwittmer Sadly switching to ubuntu:xenial-20190122 doesn't seem to work in my case. It had the same error as did xenial-20181218 and xenial-20181113.

This makes me think it might not be base image drift, but I don't know what else it would be.

janwittmer commented 5 years ago

@elihunter173 I just noticed you use Ansible Container in version 0.9.2. We use version 0.9.3rc0 by Running from Source, maybe you could give that a try.

Voronenko commented 5 years ago

Debug build of develop branch state is provided as sa-ansible-container ( https://pypi.org/project/sa-ansible-container/ )

elihunter173 commented 5 years ago

After trying ubuntu:xenial-20190122 and ubuntu:xenial-20181218 on Ansible Container 0.9.3rc0 by installing sa-ansible-container from PyPI and by installing from source, I run into the same error.

Voronenko commented 5 years ago

Pulled newer image for ubuntu:16.04 and confirming - yes, issue has appeared :(

l4r1k4 commented 5 years ago

I tried to use a dockerfile to upgrade the ubuntu:16.04 image and save it as ubuntu:matteo. But when I use it in my container.yml, I got this error :

failed to transfer file /tmp/tmpv7jboR to /root/.ansible/tmp/ansible-tmp-1552840023.29-67207851537243/setup.py:\n\nSegmentation fault (core dumped)

However I'm able to use that image with docker or docker-compose without problems.

Maybe we need to build a custom conductor too based on an upgraded xenial image like "ubuntu:matteo"?

Can you help me find out a docker file to build the ansible/container-conductor-ubuntu-xenial:0.9.3rc3 image?

Voronenko commented 5 years ago

Should be fixed by https://github.com/ansible/ansible-container/pull/977

At least my play do pass now https://github.com/softasap/sa-container-bootstrap/blob/develop/box-example/ubuntu-xenial/container.yml#L4

I've pushed updated sa-ansible-container 0.9.3rc4 to pypi.

Once build pipeline will finish for ansible develop branch, we should find conductor images from ansible docker hub too.

Unfortunately we've lost amazon linux from matrix, will look at it separately.

Voronenko commented 5 years ago

@matteotanca to rebuild conductor images, there is bakery.py script

python bakery.py You can rebuild specific conductor image

python bakery.py --distros=ubuntu alternative syntax

BASE_DISTRO=ubuntu python bakery.py If you are building custom conductor images for your organization, you might use

CONDUCTOR_PROVIDER= BASE_DISTRO=ubuntu python bakery.py

I have added several articles to wiki with ideas.

Voronenko commented 5 years ago

Conductor images for ansible now should be updated. Please check and report if it resolved your issue.

l4r1k4 commented 5 years ago

fatal: [base]: UNREACHABLE! => { "changed": false, "msg": "Authentication or permission failure. In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote temp path in ansible.cfg to a path rooted in \"/tmp\". Failed command was: ( umask 77 && mkdir -p \"echo ~/.ansible/tmp/ansible-tmp-1552860831.55-217415933356218\" && echo ansible-tmp-1552860831.55-217415933356218=\"echo ~/.ansible/tmp/ansible-tmp-1552860831.55-217415933356218\" ), exited with result 1, stderr output: Error response from daemon: Container 29ee88b370e2a88a6df7d502faf92b1005cde7088bb0828c31894a94cc74885a is not running\n", "unreachable": true }

This happens after updating to sa-ansible-container==0.9.3rc4, only with xenial.

l4r1k4 commented 5 years ago

I'm so sorry, it works fine with the softasap xenial conductor. Thanks, I can upgrade my xenial projects now.

Voronenko commented 5 years ago

@matteotanca I just tried, and it now (when build for ansible develop finished, and conductor images were pushed) works fine with official ansible conductor images.

Official ansible conductor images are recommended over custom one.

Can you try that, if succeeded - I will close the issue.

l4r1k4 commented 5 years ago

I confirm that it works with the ansible official conductors too. Thank you!!