Closed KamranAzeem closed 8 years ago
Turned out that SELINUX was enabled on all of the nodes (including master), which most probably prevented local-registry from running properly (on the master).
The docker container for registry was not running on the master node. This was problem # 1.
The second problem was that the kube-controller-manager was complaining about apiserver was not able to do a tcp dial on a IP address of the master node.
-bash-4.3# service kube-controller-manager status -l
Redirecting to /bin/systemctl status -l kube-controller-manager.service
● kube-controller-manager.service - Kubernetes Controller Manager
Loaded: loaded (/usr/lib/systemd/system/kube-controller-manager.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2016-06-13 10:14:42 UTC; 36s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 4796 (kube-controller)
Memory: 4.9M
CPU: 64ms
CGroup: /system.slice/kube-controller-manager.service
└─4796 /usr/bin/kube-controller-manager --logtostderr=true --v=0 --master=http://171.31.39.228:8080
Jun 13 10:14:42 ip-172-31-39-228.ap-southeast-2.compute.internal systemd[1]: kube-controller-manager.service: Failed with result 'exit-code'.
Jun 13 10:14:42 ip-172-31-39-228.ap-southeast-2.compute.internal systemd[1]: Started Kubernetes Controller Manager.
Jun 13 10:14:42 ip-172-31-39-228.ap-southeast-2.compute.internal systemd[1]: Starting Kubernetes Controller Manager...
Jun 13 10:14:42 ip-172-31-39-228.ap-southeast-2.compute.internal kube-controller-manager[4796]: I0613 10:14:42.258087 4796 plugins.go:71] No cloud provider specified.
Jun 13 10:14:42 ip-172-31-39-228.ap-southeast-2.compute.internal kube-controller-manager[4796]: I0613 10:14:42.258380 4796 nodecontroller.go:143] Sending events to api server.
Jun 13 10:14:42 ip-172-31-39-228.ap-southeast-2.compute.internal kube-controller-manager[4796]: E0613 10:14:42.258680 4796 controllermanager.go:216] Failed to start service controller: ServiceController should not be run without a cloudprovider.
Jun 13 10:14:42 ip-172-31-39-228.ap-southeast-2.compute.internal kube-controller-manager[4796]: I0613 10:14:42.258838 4796 controllermanager.go:229] allocate-node-cidrs set to false, node controller not creating routes
Jun 13 10:14:42 ip-172-31-39-228.ap-southeast-2.compute.internal kube-controller-manager[4796]: I0613 10:14:42.259526 4796 replication_controller.go:208] Starting RC Manager
Jun 13 10:15:12 ip-172-31-39-228.ap-southeast-2.compute.internal kube-controller-manager[4796]: E0613 10:15:12.259398 4796 controllermanager.go:259] Failed to get api versions from server: Get http://171.31.39.228:8080/api: dial tcp 171.31.39.228:8080: i/o timeout
Jun 13 10:15:12 ip-172-31-39-228.ap-southeast-2.compute.internal kube-controller-manager[4796]: E0613 10:15:12.260945 4796 nodecontroller.go:229] Error monitoring node status: Get http://171.31.39.228:8080/api/v1/nodes: dial tcp 171.31.39.228:8080: i/o timeout
-bash-4.3#
When I tried to curl that IP address on master, it did not work. When I replaced the IP address with the word localhost, it worked:
-bash-4.3# curl http://171.31.39.228:8080/api/v1/nodes
^C
-bash-4.3# curl http://localhost:8080/api/v1/nodes
{
"kind": "NodeList",
"apiVersion": "v1",
"metadata": {
"selfLink": "/api/v1/nodes",
"resourceVersion": "53008"
},
"items": [
{
"metadata": {
"name": "172.31.39.229",
. . .
[output snipped ]
. . .
-bash-4.3#
Whereas kube-api-server was configured to listen on all ports.
-bash-4.3# cat kubernetes/apiserver
. . .
KUBE_API_ADDRESS="--insecure-bind-address=0.0.0.0"
. . .
This is also evident through netstat on master:
-bash-4.3# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:2380 0.0.0.0:* LISTEN 951/etcd
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1019/sshd
tcp 0 0 127.0.0.1:7001 0.0.0.0:* LISTEN 951/etcd
tcp6 0 0 :::5000 :::* LISTEN 3887/docker-proxy
tcp6 0 0 :::6443 :::* LISTEN 4768/kube-apiserver
tcp6 0 0 :::2379 :::* LISTEN 951/etcd
tcp6 0 0 :::10251 :::* LISTEN 844/kube-scheduler
tcp6 0 0 :::10252 :::* LISTEN 4943/kube-controlle
tcp6 0 0 :::8080 :::* LISTEN 4768/kube-apiserver
tcp6 0 0 :::22 :::* LISTEN 1019/sshd
-bash-4.3#
Then I had a look at the IP address once again, and realized that there is a typing error . in /etc/kubernetes/config on master node. The IP should have been 172.31.39.228
and not 171.31.39.228
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc fq_codel state UP group default qlen 1000
link/ether 0a:5c:30:60:10:d3 brd ff:ff:ff:ff:ff:ff
inet 172.31.39.228/20 brd 172.31.47.255 scope global dynamic eth0
valid_lft 3254sec preferred_lft 3254sec
inet6 fe80::85c:30ff:fe60:10d3/64 scope link
valid_lft forever preferred_lft forever
-bash-4.3#
This incorrect IP was found to be in several config files , so we fixed the IPs in all config files and rebooted master node.
And then it works!
[fedora@ip-172-31-39-228 ~]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
www 1/1 Running 0 1m
[fedora@ip-172-31-39-228 ~]$
(Thank you Rafiqul Islam)
I saw nodes stuck in pending state when created. There were no events reported by kubectl.
Also kubectl is timing out trying to delete a deployment , saying timeout waiting for a condition.