Closed ghost closed 7 years ago
Can you show me the kubelet logs?
Did you setup a HA master cluster first? On k8s worker nodes, it's no need to copy /etc/kubernetes/ directory to worker nodes
Logs are complaining about the cert, which is in the successive steps, but it does show they are supposed to be seen initially. I tried to continue on, but could not due to the other issue.
Yes I went through all steps prior. These are not worker nodes, they are masters.
Did you edit the kubelet.conf file? If you edit the file's server setting to current host ip, it will show this problem, then you must create certificates by yourself.
I finished the section creating all certificates, kublets restart fine, but still 1 node only shows.
In the file /etc/kubernetes/kubelet.conf
there are multiple references to the hostname for original master, should I not adapt these to the second and third master's hostname?
server: https://122.11.543.678:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: system:node:vps135257.vps.ovh.eu
name: system:node:vps135257.vps.ovh.eu@kubernetes
current-context: system:node:vps135257.vps.ovh.eu@kubernetes
kind: Config
preferences: {}
users:
- name: system:node:vps135257.vps.ovh.eu
user:
You may wish to try this with v1.7 when you get the chance. It was just released and there might be changes needed to your guide. I will blow this up and try again tomorrow with v1.6.4 to see if I can get success.
v1.7 is not stable now, but I think I will try it. And make sure you turn off the firewalld, is there hardware firewall from your cloud provider prevent the communication of your masters?
v1.7.0 is in release state. Yes, I have disabled firewalld & selinux. There is no additional firewall on the VPS. This is not a "cloud" provider, it's a standard VPS. I'm using OVH with a goal to have fully HA systems with their rented dedicated servers in the near future. Please let me know how you fare. I am ready to give it another go, but will wait for a response from you.
I was able to get the nodes to show using the node join
command. Will continue with experimentation and update here after I have something solid to report.
node join
is kubectl command or kubeadm command? Or is that v1.7.0 exclusive command?
My bad kubeadm join --token
That would join as a worker node, not a master. In v1.6.4 kubeadm reset
first in k8s-master2 and k8s-master3, then copy /etc/kubernetes/kubelet.conf and /etc/kubernetes/pki to k8s-master2 and k8s-master3. You will find that k8s-master2 and k8s-master3 joined:
kubectl get nodes
NAME STATUS AGE VERSION
k8s-master1 Ready 12m v1.6.4
k8s-master2 Ready 3m v1.6.4
k8s-master3 Ready 3m v1.6.4
It means that kubelet in k8s-master2 and k8s-master3 up and connected the cluster, you can try it.
Okay I kubeadm reset
masters2/3, then scp /etc/kubernetes/kubelet.conf
& scp -r /etc/kubernetes/pki/*
then systemctl restart docker kublet
Now I have changed from:
NAME STATUS AGE VERSION
vps135abc.vps.ovh.eu Ready 4m v1.7.0
vps135def.vps.ovh.eu Ready 2m v1.7.0
vps135ghi.vps.ovh.eu Ready 2m v1.7.0
[root@vps135abc ~]# kubectl get nodes
NAME STATUS AGE VERSION
vps135abc.vps.ovh.eu Ready 1h v1.7.0
vps135def.vps.ovh.eu NotReady 1h v1.7.0
vps135ghi.vps.ovh.eu NotReady 1h v1.7.0
Just do it step by step:
on k8s-master1 copy kube-apiserver.yaml to k8s-master2 and k8s-master3, and edit kube-apiserver.yaml
file in k8s-master2 and k8s-master3, replace ${HOST_IP} to current host ip
vi /etc/kubernetes/manifests/kube-apiserver.yaml
- --advertise-address=${HOST_IP}
restart docker and kubelet on k8s-master2 and k8s-master3
systemctl restart docker kubelet
Then you will find that apiserver
and kube-proxy
startup on k8s-master2 and k8s-master3
kubectl get pods --all-namespaces -o wide
Maybe you should reset all your master nodes first then redo the kubeadm init in your first master
Yes, I am having "nodelost" show from master 1 right now. I will reset. Ouch, much is wrong now. I will remove etcd, and try to start entirely fresh. I think it would be best to just reinstall my VPS, I will try the guide again in full order.
I went through the guide exactly as listed, same exact results at the same parts. I can't get masters 2/3 to ever show up like this. If you would be willing to take a look I can give you access email me your public key: webeindustry@gmail.com
I can confirm this is working with 1.6.4, so issue is with 1.7.0. I will analyze differences in config files and report back. Should probably close these two issues. I will make another with details how to get 1.7.0 working when successful.
So you can try to use these commands to install exact version of components:
$ yum search docker --showduplicates
$ yum install docker-1.12.6-16.el7.centos.x86_64
$ yum search kubelet --showduplicates
$ yum install kubelet-1.6.4-0.x86_64
$ yum search kubeadm --showduplicates
$ yum install kubeadm-1.6.4-0.x86_64
$ yum search kubernetes-cni --showduplicates
$ yum install kubernetes-cni-0.5.1-0.x86_64
$ systemctl enable docker && systemctl start docker
$ systemctl enable kubelet && systemctl start kubelet
I will try v1.7.0 lately.
Yes I using -1.6.4 for kubelet, kubeadm, and kubectl. Everything works with 1.6 branch. 1.7 is the issue. I'm now focused on taking a few steps back, and learning other HA getups for 1.6+. I see some limitations with your setup. You should email me we can chat about this.
My gmail: cookeem@gmail.com
You need to also specify kubectl-1.6.4
else it will pull 1.7.0
You are right, I have updated the document
v1.7.0 enhance security to add NodeRestriction admission control, it will prevent master nodes to join the cluster.
$ vi /etc/kubernetes/manifests/kube-apiserver.yaml
# - --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota
- --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds
i same problem
I use kubernetes 1.8:
***$kubeadm version***
kubeadm version: &version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.3", GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", BuildDate:"2017-11-08T18:27:48Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
my config: /etc/kubernetes/manifests/kube-apiserver.yaml
- --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSec
onds,ResourceQuota
run on master-2, and master-3 still:
root@backend-023:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
backend-023 Ready master 1h v1.8.2
@cookeem @webeindustry could be reopen issue? thanks
@mtchuyen can you show me the kubelet's log?
thanks for reply!
Here kubelet's log:
backend-052 is master-2
Nov 13 14:34:33 backend-052 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Nov 13 14:34:33 backend-052 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 13 14:34:33 backend-052 kubelet[2024]: I1113 14:34:33.822758 2024 feature_gate.go:156] feature gates: map[]
Nov 13 14:34:33 backend-052 kubelet[2024]: I1113 14:34:33.822842 2024 controller.go:114] kubelet config controller: starting controller
Nov 13 14:34:33 backend-052 kubelet[2024]: I1113 14:34:33.822848 2024 controller.go:118] kubelet config controller: validating combination of defaults and flags
Nov 13 14:34:34 backend-052 kubelet[2024]: I1113 14:34:34.197458 2024 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Nov 13 14:34:34 backend-052 kubelet[2024]: I1113 14:34:34.197509 2024 client.go:95] Start docker client with request timeout=2m0s
Nov 13 14:34:34 backend-052 kubelet[2024]: W1113 14:34:34.198548 2024 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 13 14:34:34 backend-052 kubelet[2024]: I1113 14:34:34.204017 2024 feature_gate.go:156] feature gates: map[]
Nov 13 14:34:34 backend-052 kubelet[2024]: W1113 14:34:34.204248 2024 server.go:289] --cloud-provider=auto-detect is deprecated. The desired cloud provider should be set explicitly
Nov 13 14:34:34 backend-052 kubelet[2024]: I1113 14:34:34.230757 2024 certificate_manager.go:361] Requesting new certificat
I copy /etc/cni/net.d/10-flannel.conf
from master-1 then restart kubelet:
*** systemctl status kubelet****
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Mon 2017-11-13 14:45:58 +07; 7s ago
Docs: http://kubernetes.io/docs/
Main PID: 8997 (kubelet)
Tasks: 11
Memory: 12.2M
CPU: 196ms
CGroup: /system.slice/kubelet.service
└─8997 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.
Nov 13 14:45:58 backend-052 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779333 8997 feature_gate.go:156] feature gates: map[]
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779398 8997 controller.go:114] kubelet config controller: startin
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779403 8997 controller.go:118] kubelet config controller: validat
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.794076 8997 client.go:75] Connecting to docker on unix:///var/run
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.794384 8997 client.go:95] Start docker client with request timeou
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.803127 8997 feature_gate.go:156] feature gates: map[]
Nov 13 14:45:58 backend-052 kubelet[8997]: W1113 14:45:58.803332 8997 server.go:289] --cloud-provider=auto-detect is deprec
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.834492 8997 certificate_manager.go:361] Requesting new certifi
and:
Nov 13 14:45:58 backend-052 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Nov 13 14:45:58 backend-052 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779333 8997 feature_gate.go:156] feature gates: map[]
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779398 8997 controller.go:114] kubelet config controller: starting controller
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779403 8997 controller.go:118] kubelet config controller: validating combination of defaults and flags
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.794076 8997 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.794384 8997 client.go:95] Start docker client with request timeout=2m0s
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.803127 8997 feature_gate.go:156] feature gates: map[]
Nov 13 14:45:58 backend-052 kubelet[8997]: W1113 14:45:58.803332 8997 server.go:289] --cloud-provider=auto-detect is deprecated. The desired cloud provider should be set explicitly
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.834492 8997 certificate_manager.go:361] Requesting new certificate.
Make sure your apiServerCertSANs and endpoints settings in kubeadm-init-v1.7.x.yaml is right, please show me your kubeadm-init-v1.7.x.yaml file, this file like below:
$ vi /root/kubeadm-ha/kubeadm-init-v1.7.x.yaml
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
kubernetesVersion: v1.7.0
networking:
podSubnet: 10.244.0.0/16
apiServerCertSANs:
- k8s-master1
- k8s-master2
- k8s-master3
- 192.168.60.71
- 192.168.60.72
- 192.168.60.73
- 192.168.60.80
etcd:
endpoints:
- http://192.168.60.71:2379
- http://192.168.60.72:2379
- http://192.168.60.73:2379
Because I use kube V.1.8, so I rename file config
kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.3", GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", BuildDate:"2017-11-08T18:27:48Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
cat kubeadm-init-v1.8.x.yaml
apiVersion: kubeadm.k8s.io/v1alpha1 kind: MasterConfiguration kubernetesVersion: v1.8.0 networking: podSubnet: 10.244.0.0/16 apiServerCertSANs: - backend-023 - backend-052 - backend-055 - <ip_master1> - <ip_master2> - <ip_master3> - <VIRTUAL_IP: ip_master2> etcd: endpoints: - http://<ip_master1>:2379 - http://<ip_master2>:2379 - http://<ip_master3>:2379
Here is my container (some are newer version: flanneld,kube-xxx):
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE nginx latest 40960efd7b8f 8 days ago 108 MB gcr.io/google_containers/kube-apiserver-amd64 v1.8.2 6278a1092d08 2 weeks ago 194 MB gcr.io/google_containers/kube-controller-manager-amd64 v1.8.2 5eabb0eae58b 2 weeks ago 129 MB gcr.io/google_containers/kube-scheduler-amd64 v1.8.2 b48970f8473e 2 weeks ago 54.9 MB gcr.io/google_containers/kube-proxy-amd64 v1.8.2 88e2c85d3d02 2 weeks ago 93.1 MB gcr.io/google_containers/heapster-amd64 v1.4.3 6450eba57f23 5 weeks ago 73.4 MB gcr.io/google_containers/kubernetes-dashboard-amd64 v1.7.1 294879c6444e 5 weeks ago 128 MB gcr.io/google_containers/k8s-dns-sidecar-amd64 1.14.5 fed89e8b4248 6 weeks ago 41.8 MB gcr.io/google_containers/k8s-dns-kube-dns-amd64 1.14.5 512cd7425a73 6 weeks ago 49.4 MB gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64 1.14.5 459944ce8cc4 6 weeks ago 41.4 MB quay.io/coreos/flannel v0.9.0-amd64 4c600a64a18a 7 weeks ago 51.3 MB gcr.io/google_containers/heapster-influxdb-amd64 v1.3.3 577260d221db 2 months ago 12.5 MB gcr.io/google_containers/etcd-amd64 3.0.17 243830dae7dd 8 months ago 169 MB gcr.io/google_containers/heapster-grafana-amd64 v4.0.2 a1956d2a1a16 9 months ago 131 MB gcr.io/google_containers/pause-amd64 3.0 99e59f495ffa 18 months ago 747 kB
thanks.
Check log in master-1:
kubectl logs -n kube-system kube-controller-manager-backend-023
E1113 12:42:41.061323 1 certificate_controller.go:139] Sync csr-6zgzk failed with : recognized csr "csr-6zgzk" as [nodeclient] but subject access review was not approved
E1113 12:43:56.948538 1 certificate_controller.go:139] Sync csr-7v85l failed with : recognized csr "csr-7v85l" as [nodeclient] but subject access review was not approved
E1113 12:46:13.572616 1 certificate_controller.go:139] Sync csr-4wgsj failed with : recognized csr "csr-4wgsj" as [nodeclient] but subject access review was not approved
E1113 12:48:05.949856 1 certificate_controller.go:139] Sync csr-f24wn failed with : recognized csr "csr-f24wn" as [nodeclient] but subject access review was not approved
E1113 12:49:24.632010 1 certificate_controller.go:139] Sync csr-7v85l failed with : recognized csr "csr-7v85l" as [nodeclient] but subject access review was not approved
E1113 12:53:36.424552 1 certificate_controller.go:139] Sync csr-6zgzk failed with : recognized csr "csr-6zgzk" as [nodeclient] but subject access review was not approved
kubectl logs -n kube-system kube-scheduler-backend-023
E1113 12:06:46.443935 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Node: Get https://10.1.0.23:6443/api/v1/nodes?resourceVersion=0: dial tcp 10.1.0.23:6443: getsockopt: connection refused
E1113 12:06:46.444638 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1beta1.StatefulSet: Get https://10.1.0.23:6443/apis/apps/v1beta1/statefulsets?resourceVersion=0: dial tcp 10.1.0.23:6443: getsockopt: connection refused
E1113 12:06:46.445821 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1beta1.ReplicaSet: Get https://10.1.0.23:6443/apis/extensions/v1beta1/replicasets?resourceVersion=0: dial tcp 10.1.0.23:6443: getsockopt: connection refused
E1113 12:06:46.446870 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.ReplicationController: Get https://10.1.0.23:6443/api/v1/replicationcontrollers?resourceVersion=0: dial tcp 10.1.0.23:6443: getsockopt: connection refused
Hi @cookeem , the problem is livenessProbe
in apiserver.
see comment by pipejakob in https://github.com/kubernetes/kubeadm/issues/193
thanks.
It seems that certs not match, so livenessProbe failed, the nodes can not joined the cluster.
E1113 12:42:41.061323 1 certificate_controller.go:139] Sync csr-6zgzk failed with : recognized csr "csr-6zgzk" as [nodeclient] but subject access review was not approved
E1113 12:43:56.948538 1 certificate_controller.go:139] Sync csr-7v85l failed with : recognized csr "csr-7v85l" as [nodeclient] but subject access review was not approved
E1113 12:46:13.572616 1 certificate_controller.go:139] Sync csr-4wgsj failed with : recognized csr "csr-4wgsj" as [nodeclient] but subject access review was not approved
E1113 12:48:05.949856 1 certificate_controller.go:139] Sync csr-f24wn failed with : recognized csr "csr-f24wn" as [nodeclient] but subject access review was not approved
E1113 12:49:24.632010 1 certificate_controller.go:139] Sync csr-7v85l failed with : recognized csr "csr-7v85l" as [nodeclient] but subject access review was not approved
E1113 12:53:36.424552 1 certificate_controller.go:139] Sync csr-6zgzk failed with : recognized csr "csr-6zgzk" as [nodeclient] but subject access review was not approved
Check this document: https://kubernetes.io/docs/admin/kubeadm/, make sure your certs create correctly.
I use certificates was generated by kubeadm (auto) and nothing change from your guide,
@mtchuyen If create certificates failed, you can try to create by manual.
Just comment apiServerCertSANs settings in kubeadm-init-v1.8.x.yaml
file:
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
kubernetesVersion: v1.8.0
networking:
podSubnet: 10.244.0.0/16
#apiServerCertSANs:
#- backend-023
#- backend-052
#- backend-055
#- <ip_master1>
#- <ip_master2>
#- <ip_master3>
#- <VIRTUAL_IP: ip_master2>
etcd:
endpoints:
- http://<ip_master1>:2379
- http://<ip_master2>:2379
- http://<ip_master3>:2379
On all master nodes create certificates by manual:
Create key file apiserver-manual.key
:
openssl genrsa -out apiserver-manual.key 2048
Create csr file apiserver-manual.csr
:
openssl req -new -key apiserver-wudang.key -subj "/CN=kube-apiserver," -out apiserver-manual.csr
Create ext file apiserver-manual.ext
:
vi apiserver-manual.ext
subjectAltName = DNS:${CURRENT_HOSTNAME},DNS:kubernetes,DNS:kubernetes.default,DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP:${MASTER1_IP}, IP:${MASTER2_IP}, IP:${MASTER3_IP}, IP:${VIRTUAL_IP}
Use /etc/kubernetes/pki/ca.crt to create certificates apiserver-manual.crt
:
openssl x509 -req -in apiserver-manual.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -CAcreateserial -out apiserver-manual.crt -days 365 -extfile apiserver-manual.ext
Replace kube-apiserver.yaml
settings:
vi kube-apiserver.yaml
- --tls-cert-file=${YOUR_PATH}/apiserver-manual.crt
- --tls-private-key-file=${YOUR_PATH}/apiserver-manual.key
Restart your cluster and check it:
systemctl restart kubelet docker
thank @cookeem !
Using v1.7, nodes are not joined. I
scp /etc/kubernetes
to other masters, thensystemctl daemon-reload && systemctl restart kubelet
followed bysystemctl status kubelet
. It is running; however only the initial node shows up. Should we not be sing thekubeadm join
command around this point?