elastic / ansible-elastic-cloud-enterprise

Ansible playbooks for Elastic Cloud Enterprise (ECE)
https://www.elastic.co/products/ece
Other
63 stars 62 forks source link

Various Issues With Current Ansible Playbook for ECE #100

Open MorrieAtElastic opened 4 years ago

MorrieAtElastic commented 4 years ago

I am pasting in a series of comments provided by a user which they encountered when "testing the latest version of the ansible playbook/install script". User requested I post these issues because they do not have access to github:

Install process:

~/ansible-elastic-cloud-enterprise/templates$ ls
docker1.13.conf  docker18.09.conf  docker19.03.conf  elastic.cfg.j2  format-drives.j2
~/ansible-elastic-cloud-enterprise$ vi tasks/base/general/configure_docker.yml

- name: Ensures /etc/systemd/system/docker.service.d dir exists
  file:
    path: /etc/systemd/system/docker.service.d
    state: directory
  when: docker_version == '18.09'

- name: Create service.d docker.conf
  template:
    src: docker{{ docker_version }}.conf
    dest: /etc/systemd/system/docker.service.d/docker.conf
  when: docker_version == '18.09'

- name: set docker storage options
  lineinfile:
    path: /etc/sysconfig/docker
    regexp: "^OPTIONS='(.*)'"
    line: "OPTIONS='-g {{ data_dir }}/docker \\1'"
    backrefs: yes
    create: yes
  when: docker_version == '1.13'

- name: set docker network options
  lineinfile:
    path: /etc/sysconfig/docker-network
    regexp: '^DOCKER_NETWORK_OPTIONS='
    line: 'DOCKER_NETWORK_OPTIONS="--bip={{ docker_bridge_ip }}"'
    create: yes
  when: docker_version == '1.13'

- name: set docker storage driver
  lineinfile:
    path: /etc/sysconfig/docker-storage-setup
    regexp: '^DOCKER_NETWORK_OPTIONS='
    line: 'STORAGE_DRIVER={{ docker_storage_driver }}'
    create: yes
  when: docker_version == '1.13'

Upgrade process:

1) The install script elastic-cloud-enterprise.sh fails when it tries to retrieve the HOST_STORAGE_PATH with ansible. By Removing the -it parameters from the docker exec command, this seems to fix the issue. This also works properly without -it when running the docker exec command in a ssh session.

Problematic lines of code (removed the /dev/null to be able to see the root cause of this error).

  SOURCE_CONTAINER_NAME="frc-runners-runner"
  HOST_STORAGE_PATH=$(docker -H "unix://${HOST_DOCKER_HOST}" exec -it $SOURCE_CONTAINER_NAME bash -c 'echo -n $HOST_STORAGE_PATH' | cut -d: -f 2)
  if [[ -z "${HOST_STORAGE_PATH}" ]]; then
      echo -e "${RED}Container $SOURCE_CONTAINER_NAME was not found -- is the environment running?${NC}"
      exit $GENERAL_ERROR_EXIT_CODE
  fi
  SOURCE_CONTAINER_NAME="frc-directors-director"
  ZK_ROOT_PASSWORD=$(docker -H "unix://${HOST_DOCKER_HOST}" exec -it $SOURCE_CONTAINER_NAME bash -c 'echo -n $FOUND_ZK_READWRITE' | cut -d: -f 2)
  if [[ -z "${ZK_ROOT_PASSWORD}" ]]; then
      echo -e "${RED}Container $SOURCE_CONTAINER_NAME was not found -- does the current host have a role 'director'?${NC}"
      exit $GENERAL_ERROR_EXIT_CODE
  fi

Error

"the input device is not a TTY"

TASK [elastic-cloud-enterprise : include_tasks] 

included: /home/user/ansible/roles/elastic-cloud-enterprise/tasks/ece-bootstrap/upgrade.yml for <REDACTED>
TASK [elastic-cloud-enterprise : Execute upgrade] 

fatal: [<REDACTED>]: FAILED! => {"changed": true, "cmd": "/home/elastic/elastic-cloud-enterprise.sh upgrade --cloud-enterprise-version 2.6.2 --docker-registry docker.elastic.co --ece-docker-repository cloud-enterprise", "delta": "0:00:00.434120", "end": "2020-10-12 10:29:11.156153", "msg": "non-zero return code", "rc": 1, "start": "2020-10-12 10:29:10.722033", "stderr": "+ SOURCE_CONTAINER_NAME=frc-runners-runner\n++ docker -H unix:///var/run/docker.sock exec -it frc-runners-runner bash -c 'echo -n $HOST_STORAGE_PATH'\n++ cut -d: -f 2\nthe input device is not a TTY\n+ HOST_STORAGE_PATH=\n+ [[ -z '' ]]\n+ echo -e '\\033[0;31mContainer frc-runners-runner was not found -- is the environment running?\\033[0m'\n+ exit 1", "stderr_lines": ["+ SOURCE_CONTAINER_NAME=frc-runners-runner", "++ docker -H unix:///var/run/docker.sock exec -it frc-runners-runner bash -c 'echo -n $HOST_STORAGE_PATH'", "++ cut -d: -f 2", "the input device is not a TTY", "+ HOST_STORAGE_PATH=", "+ [[ -z '' ]]", "+ echo -e '\\033[0;31mContainer frc-runners-runner was not found -- is the environment running?\\033[0m'", "+ exit 1"], "stdout": "\u001b[0;31mContainer frc-runners-runner was not found -- is the environment running?\u001b[0m", "stdout_lines": ["\u001b[0;31mContainer frc-runners-runner was not found -- is the environment running?\u001b[0m"]}

The task below in ~/elastic-cloud-enterprise/tasks/ece-bootstrap/main.yml fails (permission issue) when I run the playbook with my own admin user. This command has to run as root or elastic user. So a sudo instruction has to be added to that task.

- name: Check if an installation or upgrade should be performed
  shell: docker ps -a -f name=frc-runners-runner --format {%raw%}"{{.Image}}"{%endraw%}
  register: existing_runner
  tags: [dbg]

  become: yes
  become_method: sudo
  become_user: elastic

Error:

TASK [elastic-cloud-enterprise : Check if an installation or upgrade should be performed] 

fatal: [<REDACTED>]: FAILED! => {"changed": true, "cmd": "docker ps -a -f name=frc-runners-runner --format \"{{.Image}}\"", "delta": "0:00:00.419693", "end": "2020-10-12 10:52:42.118514", "msg": "non-zero return code", "rc": 1, "start": "2020-10-12 10:52:41.698821", "stderr": "Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json?all=1&filters=%7B%22name%22%3A%7B%22frc-runners-runner%22%3Atrue%7D%7D: dial unix /var/run/docker.sock: connect: permission denied", "stderr_lines": ["Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json?all=1&filters=%7B%22name%22%3A%7B%22frc-runners-runner%22%3Atrue%7D%7D: dial unix /var/run/docker.sock: connect: permission denied"], "stdout": "", "stdout_lines": []}

Documentation update (README.md)

https://github.com/elastic/ansible-elastic-cloud-enterprise#performing-an-upgrade

The upgrade section of the documentation indicates to use the following command:

ansible-playbook -i inventory.yml site.yml --skip-tags base

By just skipping the base tag, the playbook still performs some destructive(volume-fs creation) or unwanted(system reboot) task from direct install.

To perform an upgrade of ece only, I had to use the following command:

ansible-playbook -i inventory.yml site.yml --tags bootstrap

vaubarth commented 4 years ago

Ubuntu 18.04/docker 19.03: docker19.03.conf is present in the "template" folder but there is no task to copy it on the remote system. I only see tasks related to docker 18.09 or docker 1.13:

@obierlaire I think we missed this with #98 https://github.com/elastic/ansible-elastic-cloud-enterprise/blob/db4ca75b84935955f740c36940ff37010a3016cb/tasks/base/general/configure_docker.yml#L13-L17 We probably want to remove the when here?

Ubuntu 18.04/docker 19.03: docker_version 18.09 => assertion failure when running the playbook on Ubuntu 18.04 that requires docker 19.03.

@MorrieAtElastic Not sure about this, did they try to set 18.09 and got the error? Thats expected I think as per the support matrix

By Removing the -it parameters from the docker exec command, this seems to fix the issue.

Interesting, I'll try to reproduce, but this might need a change in the installer script.

By just skipping the base tag, the playbook still performs some destructive(volume-fs creation) or unwanted(system reboot) task from direct install.

This was missed in the docs when we changed the setup process, will be updated