Closed mighani closed 5 years ago
Got OKD 3.11 installation to work to a fair degree on Centos 7.6 working using the following changes:
release-3.11
in install-from-bastion.sh.openshift_release=v3.11
in inventory.template.cfg#L23openshift_additional_repos=[{'id': 'centos-okd-311', 'name': 'centos-okd-311', 'baseurl' :'http://mirror.centos.org/centos/7/paas/x86_64/openshift-origin311/', 'gpgcheck' :'0', 'enabled' :'1'}]
The repo is needed only for Centos, and not for RHEL. Details about the Origin 3.11 repo were referenced from: https://lists.openshift.redhat.com/openshift-archives/users/2018-November/msg00007.html
After installation, instability was noticed in the master node(s), compared to older versions of OpenShift Origin, especially around etcd and API server pods. Due to this, failures cascaded to every other OKD component. To end-users, failures occurred when running oc commands or when accessing the web console, with log messages reporting messages like Failed to list *v1.Service
or Failed to list *v1.Pod
and dial tcp 10.0.1.83:8443: connect: connection refused
; the API server pods were restarting frequently. Managed to recover from this, by restarting the docker and origin-node services on master node, but I'm not confident this is either recommended or sufficient. So maybe, these aren't the only changes.
Component failures in okd 3.11 installations on Centos seem to be related to a newer version of Docker. See: https://github.com/kubernetes/kubeadm/issues/1299#issue-387454536
On CentOs 7.5 I get the error message on each node: Currently, NetworkManager must be installed and enabled prior to installation.
in "Verify Node Network Manager".
Do you have any experiences fixing this error?
@mariusfilipowski Yes, the openshift-ansible scripts need some modification for Centos 7 https://github.com/VineetReynolds/openshift-ansible/commit/2c54d74ac02c84491695a268907c0fbb88be01cd
@mariusfilipowski I had to make a custom playbook to install docker, and NetworkManager, and this one was problematic since it required a reboot (!) to work properly. I can share the playbooks to have a basic Centos 7 install as a pre step if you are interested.
It'd be great to see how you did it @zoobab I'm sure it'd help others coming across these issues!
@zoobab That would be very helpful. I tried it also with Redhat 7.5 but this did fail too.
Here is my adhoc yaml openshift-ansible/playbooks/adhoc/bootstrap-centos.yaml
for centos7, feel free to adapt as you wish:
# Notes: tested against Centos 7, some parts are specific (pip and j2cli, and the public-hostname part are working on AWS and Openstack)
---
- hosts: OSEv3:children
gather_facts: False
become: yes
tasks:
- name: Wait that the machines are reachable
wait_for_connection:
timeout: 300
- name: Add Epel repo
copy:
dest: "/etc/yum.repos.d/epel.repo"
content: |
[epel]
name=epel
baseurl=http://dl.fedoraproject.org/pub/epel/7/x86_64/
gpgcheck=0
- name: Run yum update
yum: name=* state=latest update_cache=yes
- name: Install old version of pip
yum:
name: python-pip
- name: Install the latest version of pip
pip:
name: pip
extra_args: --upgrade
- name: Install the j2cli via pip
pip:
name: j2cli
- name: Install required packages (docker, curl, httpd-tools, etc...)
yum:
name: "{{ packages }}"
vars:
packages:
- wget
- git
- net-tools
- bind-utils
- iptables-services
- bridge-utils
- bash-completion
- kexec-tools
- sos
- psacct
- jq
- docker-1.13.1
- skopeo
- python-docker-py
- openvswitch
- awscli
- NetworkManager
- unzip
- vim
- python-virtualenv
- gcc
- httpd-tools
- name: Systemd enable NetworkManager
systemd:
name: NetworkManager
enabled: yes
masked: no
- name: Systemd enable Docker
systemd:
name: docker
enabled: yes
- name: Create directory
file:
path: /etc/systemd/system/docker.service.d
state: directory
owner: root
group: root
- name: Restart Docker
systemd:
state: restarted
daemon_reload: yes
name: docker
- name: Get this instance public hostname
get_url:
url: http://169.254.169.254/latest/meta-data/public-hostname
dest: /tmp/public-hostname
- name: Get this instance public hostname bis
command: cat /tmp/public-hostname
register: myhostname
- debug:
msg: "Public Hostname {{ myhostname.stdout }}"
- name: Set the hostname
shell: hostnamectl set-hostname {{ myhostname.stdout }}
- name: Rebooting
command: /sbin/shutdown -r +1 "Ansible-triggered Reboot"
async: 0
poll: 0
- name: Wait for server to come back
wait_for_connection:
delay: 120
- name: Check NetworkManager
systemd:
state: started
name: NetworkManager
Hi all, thanks for the comments! I've closed this issue as OKD 3.11 is working fine now, but created a new, more specific issue to track the CentOS 7.5 challenges (#79).
How to support 3.11? Just changing version in the install-from-bastion.sh is not enough. Anything else to change?