eliu / openshift-vagrant

Bring up a real OKD cluster on your local machine using Vagrant and VirtualBox
Apache License 2.0
57 stars 56 forks source link

Wait for control plane pods to appear #10

Closed Voronenko closed 5 years ago

Voronenko commented 5 years ago

As of 16 of Jun, master plane fails to start with message

Wait for control plane pods to appear ....

TASK [openshift_control_plane : Report control plane errors] *********************************************************************************************************************************
fatal: [master.example.com]: FAILED! => {"changed": false, "msg": "Control plane pods didn't come up"}                                                                                        

NO MORE HOSTS LEFT ***************************************************************************************************************************************************************************
        to retry, use: --limit @/home/vagrant/openshift-ansible/playbooks/deploy_cluster.retry                                                                                                

PLAY RECAP ***********************************************************************************************************************************************************************************
localhost                  : ok=11   changed=0    unreachable=0    failed=0                                                                                                                   
master.example.com         : ok=324  changed=149  unreachable=0    failed=1                                                                                                                   
node01.example.com         : ok=113  changed=60   unreachable=0    failed=0                                                                                                                   
node02.example.com         : ok=113  changed=60   unreachable=0    failed=0  

it appears that root cause is line

127.0.0.1      master.example.com      master

present in /etc/hosts and etcd listening purely on 192.168.150.101

#127.0.0.1      master.example.com      master
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

## vagrant-hostmanager-start
192.168.150.101 master.example.com
192.168.150.101 etcd.example.com
192.168.150.101 nfs.example.com
192.168.150.103 node02.example.com

192.168.150.102 node01.example.com
192.168.150.102 lb.example.com
## vagrant-hostmanager-end

Manual correcting of hosts file on master node solves the issue

eliu commented 5 years ago

Similar issue discussed on vagrant-hostmanager repo: https://github.com/devopsgroup-io/vagrant-hostmanager/issues/203

Voronenko commented 5 years ago

Thanks a lot will take a look.

Also faced some additional issues, but finally got it working with containerized=no and using etcd_ip variable as below.

containerized=yes - does not work at a moment

#
# Copyright 2017 Liu Hongyu
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
etcd

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=vagrant
os_firewall_use_firewalld=True

# If ansible_ssh_user is not root, ansible_become must be set to true
ansible_become=true

openshift_deployment_type=origin
openshift_release='{{OPENSHIFT_RELEASE}}'

# Specify an exact rpm version to install or configure.
# WARNING: This value will be used for all hosts in RPM based environments, even those that have another version installed.
# This could potentially trigger an upgrade and downtime, so be careful with modifying this value after the cluster is set up.
#openshift_pkg_version=-{{OPENSHIFT_PKG_VERSION}}

# uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'{{HTPASSWORD_FILENAME}}}]
# Default login account: admin / handhand
openshift_master_htpasswd_users={'admin': '$apr1$gfaL16Jf$c.5LAvg3xNDVQTkk6HpGB1'}

openshift_disable_check=disk_availability,memory_availability,docker_storage,docker_image_availability
openshift_docker_options=" --selinux-enabled --log-driver=journald --storage-driver=overlay"

#
# Author's Note
#
# Disable service catalog and TSB install
# These 2 component will lead to a failed install during tasks of Running Verification (120 tries)
# This might happen only in China so far. The workaround is to enable VPN during the verification.
#
openshift_enable_service_catalog=false
template_service_broker_install=false

# openshift_hosted_manage_registry=false

# OpenShift Router Options
# Router selector (optional)
# Router will only be created if nodes matching this label are present.
# Default value: 'region=infra'
# openshift_router_selector='node-role.kubernetes.io/infra=true'
# openshift_registry_selector='node-role.kubernetes.io/infra=true'

# default subdomain to use for exposed routes
openshift_master_default_subdomain=openshift.openshift.local

# host group for masters
[masters]
master.openshift.local etcd_ip={{NETWORK_BASE}}.101 openshift_host={{NETWORK_BASE}}.101 ansible_ssh_private_key_file="/home/vagrant/.ssh/master.key"

# host group for etcd
[etcd]
master.openshift.local etcd_ip={{NETWORK_BASE}}.101 openshift_host={{NETWORK_BASE}}.101 ansible_ssh_private_key_file="/home/vagrant/.ssh/master.key"
#
# host group for nodes, includes region info
# For openshift_node_labels strategies, the following reference links might be helpful
# to understand why we choose this current solution:
# - https://github.com/openshift/openshift-ansible#setup
# - https://github.com/openshift/openshift-ansible#node-group-definition-and-mapping
# - https://docs.okd.io/3.7/install_config/install/advanced_install.html#configuring-node-host-labels
# - https://docs.okd.io/3.9/install_config/install/advanced_install.html#configuring-node-host-labels
# - https://docs.okd.io/3.10/install/configuring_inventory_file.html#configuring-node-host-labels
#
# The default node selector for
# release-3.9 ( or prev versions ): 'region=infra'
# release-3.10: 'node-role.kubernetes.io/infra=true'
#
# But release-3.9 starts to enable node roles features. For backward compatibilities, we
# override the default values of openshift_router_selector and openshift_registry_selector
# from 'region=infra' to 'node-role.kubernetes.io/infra=true'
#
[nodes]
master.openshift.local containerized=false etcd_ip={{NETWORK_BASE}}.101 openshift_host={{NETWORK_BASE}}.101 ansible_ssh_private_key_file="/home/vagrant/.ssh/master.key" openshift_schedulable=true {{NODE_GROUP_MASTER_INFRA}}
node01.openshift.local etcd_ip={{NETWORK_BASE}}.102 openshift_host={{NETWORK_BASE}}.102 ansible_ssh_private_key_file="/home/vagrant/.ssh/node01.key" openshift_schedulable=true {{NODE_GROUP_COMPUTE}}
node02.openshift.local etcd_ip={{NETWORK_BASE}}.103 openshift_host={{NETWORK_BASE}}.103 ansible_ssh_private_key_file="/home/vagrant/.ssh/node02.key" openshift_schedulable=true {{NODE_GROUP_COMPUTE}}
eliu commented 5 years ago

@Voronenko an temporary fix has been pushed just now. Please have a test on your machine and let me know if it works as expected. Thanks.

Voronenko commented 5 years ago

Sorry for a delay, at a moment I am using forked repo using snippet few comments above. When I will be rebuilding the cluster, I will let you know.