dwmkerr / terraform-aws-openshift

Create infrastructure with Terraform and AWS, install OpenShift. Party!
http://www.dwmkerr.com/get-up-and-running-with-openshift-on-aws
MIT License
170 stars 174 forks source link

okd release-3.11 #70

Closed mighani closed 5 years ago

mighani commented 6 years ago

How to support 3.11? Just changing version in the install-from-bastion.sh is not enough. Anything else to change?

VineetReynolds commented 5 years ago

Got OKD 3.11 installation to work to a fair degree on Centos 7.6 working using the following changes:

The repo is needed only for Centos, and not for RHEL. Details about the Origin 3.11 repo were referenced from: https://lists.openshift.redhat.com/openshift-archives/users/2018-November/msg00007.html

After installation, instability was noticed in the master node(s), compared to older versions of OpenShift Origin, especially around etcd and API server pods. Due to this, failures cascaded to every other OKD component. To end-users, failures occurred when running oc commands or when accessing the web console, with log messages reporting messages like Failed to list *v1.Service or Failed to list *v1.Pod and dial tcp 10.0.1.83:8443: connect: connection refused; the API server pods were restarting frequently. Managed to recover from this, by restarting the docker and origin-node services on master node, but I'm not confident this is either recommended or sufficient. So maybe, these aren't the only changes.

VineetReynolds commented 5 years ago

Component failures in okd 3.11 installations on Centos seem to be related to a newer version of Docker. See: https://github.com/kubernetes/kubeadm/issues/1299#issue-387454536

mariusfilipowski commented 5 years ago

On CentOs 7.5 I get the error message on each node: Currently, NetworkManager must be installed and enabled prior to installation. in "Verify Node Network Manager".

Do you have any experiences fixing this error?

VineetReynolds commented 5 years ago

@mariusfilipowski Yes, the openshift-ansible scripts need some modification for Centos 7 https://github.com/VineetReynolds/openshift-ansible/commit/2c54d74ac02c84491695a268907c0fbb88be01cd

zoobab commented 5 years ago

@mariusfilipowski I had to make a custom playbook to install docker, and NetworkManager, and this one was problematic since it required a reboot (!) to work properly. I can share the playbooks to have a basic Centos 7 install as a pre step if you are interested.

dwmkerr commented 5 years ago

It'd be great to see how you did it @zoobab I'm sure it'd help others coming across these issues!

mariusfilipowski commented 5 years ago

@zoobab That would be very helpful. I tried it also with Redhat 7.5 but this did fail too.

zoobab commented 5 years ago

Here is my adhoc yaml openshift-ansible/playbooks/adhoc/bootstrap-centos.yaml for centos7, feel free to adapt as you wish:

# Notes: tested against Centos 7, some parts are specific (pip and j2cli, and the public-hostname part are working on AWS and Openstack)

---
- hosts: OSEv3:children
  gather_facts: False
  become: yes
  tasks:
  - name: Wait that the machines are reachable
    wait_for_connection:
      timeout: 300
  - name: Add Epel repo
    copy:
      dest: "/etc/yum.repos.d/epel.repo"
      content: |
        [epel]
        name=epel
        baseurl=http://dl.fedoraproject.org/pub/epel/7/x86_64/
        gpgcheck=0
  - name: Run yum update
    yum: name=* state=latest update_cache=yes
  - name: Install old version of pip
    yum:
      name: python-pip
  - name: Install the latest version of pip
    pip:
      name: pip
      extra_args: --upgrade
  - name: Install the j2cli via pip
    pip:
      name: j2cli
  - name: Install required packages (docker, curl, httpd-tools, etc...)
    yum:
      name: "{{ packages }}"
    vars:
      packages:
      - wget
      - git
      - net-tools
      - bind-utils
      - iptables-services
      - bridge-utils
      - bash-completion
      - kexec-tools
      - sos
      - psacct
      - jq
      - docker-1.13.1
      - skopeo
      - python-docker-py
      - openvswitch
      - awscli
      - NetworkManager
      - unzip
      - vim
      - python-virtualenv
      - gcc
      - httpd-tools
  - name: Systemd enable NetworkManager
    systemd:
      name: NetworkManager
      enabled: yes
      masked: no
  - name: Systemd enable Docker
    systemd:
      name: docker
      enabled: yes
  - name: Create directory
    file:
      path: /etc/systemd/system/docker.service.d
      state: directory
      owner: root
      group: root
  - name: Restart Docker
    systemd:
      state: restarted
      daemon_reload: yes
      name: docker
  - name: Get this instance public hostname
    get_url:
      url: http://169.254.169.254/latest/meta-data/public-hostname
      dest: /tmp/public-hostname
  - name: Get this instance public hostname bis
    command: cat /tmp/public-hostname
    register: myhostname
  - debug:
      msg: "Public Hostname {{ myhostname.stdout }}"
  - name: Set the hostname
    shell: hostnamectl set-hostname {{ myhostname.stdout }}
  - name: Rebooting
    command: /sbin/shutdown -r +1 "Ansible-triggered Reboot"
    async: 0
    poll: 0
  - name: Wait for server to come back
    wait_for_connection:
      delay: 120
  - name: Check NetworkManager
    systemd:
      state: started
      name: NetworkManager
dwmkerr commented 5 years ago

Hi all, thanks for the comments! I've closed this issue as OKD 3.11 is working fine now, but created a new, more specific issue to track the CentOS 7.5 challenges (#79).