cookeem / kubeadm-ha

通过kubeadm安装kubernetes高可用集群,使用docker/containerd容器运行时,适用v1.24.x以上版本
MIT License
679 stars 275 forks source link

nodes are not joined #2

Closed ghost closed 7 years ago

ghost commented 7 years ago

Using v1.7, nodes are not joined. I scp /etc/kubernetes to other masters, then systemctl daemon-reload && systemctl restart kubelet followed by systemctl status kubelet. It is running; however only the initial node shows up. Should we not be sing the kubeadm join command around this point?

cookeem commented 7 years ago

Can you show me the kubelet logs?

cookeem commented 7 years ago

Did you setup a HA master cluster first? On k8s worker nodes, it's no need to copy /etc/kubernetes/ directory to worker nodes

ghost commented 7 years ago

Logs are complaining about the cert, which is in the successive steps, but it does show they are supposed to be seen initially. I tried to continue on, but could not due to the other issue.

Yes I went through all steps prior. These are not worker nodes, they are masters.

cookeem commented 7 years ago

Did you edit the kubelet.conf file? If you edit the file's server setting to current host ip, it will show this problem, then you must create certificates by yourself.

ghost commented 7 years ago

I finished the section creating all certificates, kublets restart fine, but still 1 node only shows.

In the file /etc/kubernetes/kubelet.conf there are multiple references to the hostname for original master, should I not adapt these to the second and third master's hostname?

    server: https://122.11.543.678:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: system:node:vps135257.vps.ovh.eu
  name: system:node:vps135257.vps.ovh.eu@kubernetes
current-context: system:node:vps135257.vps.ovh.eu@kubernetes
kind: Config
preferences: {}
users:
- name: system:node:vps135257.vps.ovh.eu
  user:

You may wish to try this with v1.7 when you get the chance. It was just released and there might be changes needed to your guide. I will blow this up and try again tomorrow with v1.6.4 to see if I can get success.

cookeem commented 7 years ago

v1.7 is not stable now, but I think I will try it. And make sure you turn off the firewalld, is there hardware firewall from your cloud provider prevent the communication of your masters?

ghost commented 7 years ago

v1.7.0 is in release state. Yes, I have disabled firewalld & selinux. There is no additional firewall on the VPS. This is not a "cloud" provider, it's a standard VPS. I'm using OVH with a goal to have fully HA systems with their rented dedicated servers in the near future. Please let me know how you fare. I am ready to give it another go, but will wait for a response from you.

I was able to get the nodes to show using the node join command. Will continue with experimentation and update here after I have something solid to report.

cookeem commented 7 years ago

node join is kubectl command or kubeadm command? Or is that v1.7.0 exclusive command?

ghost commented 7 years ago

My bad kubeadm join --token

cookeem commented 7 years ago

That would join as a worker node, not a master. In v1.6.4 kubeadm reset first in k8s-master2 and k8s-master3, then copy /etc/kubernetes/kubelet.conf and /etc/kubernetes/pki to k8s-master2 and k8s-master3. You will find that k8s-master2 and k8s-master3 joined:

kubectl get nodes
NAME       STATUS    AGE       VERSION
k8s-master1    Ready     12m       v1.6.4
k8s-master2   Ready     3m        v1.6.4
k8s-master3   Ready     3m        v1.6.4
cookeem commented 7 years ago

It means that kubelet in k8s-master2 and k8s-master3 up and connected the cluster, you can try it.

ghost commented 7 years ago

Okay I kubeadm reset masters2/3, then scp /etc/kubernetes/kubelet.conf & scp -r /etc/kubernetes/pki/* then systemctl restart docker kublet

Now I have changed from:

NAME                   STATUS    AGE       VERSION
vps135abc.vps.ovh.eu   Ready     4m        v1.7.0
vps135def.vps.ovh.eu   Ready     2m        v1.7.0
vps135ghi.vps.ovh.eu   Ready     2m        v1.7.0
[root@vps135abc ~]# kubectl get nodes
NAME                   STATUS     AGE       VERSION
vps135abc.vps.ovh.eu   Ready      1h        v1.7.0
vps135def.vps.ovh.eu   NotReady   1h        v1.7.0
vps135ghi.vps.ovh.eu   NotReady   1h        v1.7.0
cookeem commented 7 years ago

Just do it step by step:

  1. on k8s-master1 copy kube-apiserver.yaml to k8s-master2 and k8s-master3, and edit kube-apiserver.yaml file in k8s-master2 and k8s-master3, replace ${HOST_IP} to current host ip

    vi /etc/kubernetes/manifests/kube-apiserver.yaml
    - --advertise-address=${HOST_IP}
  2. restart docker and kubelet on k8s-master2 and k8s-master3

    systemctl restart docker kubelet

Then you will find that apiserver and kube-proxy startup on k8s-master2 and k8s-master3

kubectl get pods --all-namespaces -o wide
cookeem commented 7 years ago

Maybe you should reset all your master nodes first then redo the kubeadm init in your first master

ghost commented 7 years ago

Yes, I am having "nodelost" show from master 1 right now. I will reset. Ouch, much is wrong now. I will remove etcd, and try to start entirely fresh. I think it would be best to just reinstall my VPS, I will try the guide again in full order.

ghost commented 7 years ago

I went through the guide exactly as listed, same exact results at the same parts. I can't get masters 2/3 to ever show up like this. If you would be willing to take a look I can give you access email me your public key: webeindustry@gmail.com

ghost commented 7 years ago

I can confirm this is working with 1.6.4, so issue is with 1.7.0. I will analyze differences in config files and report back. Should probably close these two issues. I will make another with details how to get 1.7.0 working when successful.

cookeem commented 7 years ago

So you can try to use these commands to install exact version of components:

$ yum search docker --showduplicates
$ yum install docker-1.12.6-16.el7.centos.x86_64

$ yum search kubelet --showduplicates
$ yum install kubelet-1.6.4-0.x86_64

$ yum search kubeadm --showduplicates
$ yum install kubeadm-1.6.4-0.x86_64 

$ yum search kubernetes-cni --showduplicates
$ yum install kubernetes-cni-0.5.1-0.x86_64

$ systemctl enable docker && systemctl start docker
$ systemctl enable kubelet && systemctl start kubelet

I will try v1.7.0 lately.

ghost commented 7 years ago

Yes I using -1.6.4 for kubelet, kubeadm, and kubectl. Everything works with 1.6 branch. 1.7 is the issue. I'm now focused on taking a few steps back, and learning other HA getups for 1.6+. I see some limitations with your setup. You should email me we can chat about this.

cookeem commented 7 years ago

My gmail: cookeem@gmail.com

ghost commented 7 years ago

You need to also specify kubectl-1.6.4 else it will pull 1.7.0

cookeem commented 7 years ago

You are right, I have updated the document

cookeem commented 7 years ago

v1.7.0 enhance security to add NodeRestriction admission control, it will prevent master nodes to join the cluster.

$ vi /etc/kubernetes/manifests/kube-apiserver.yaml
#    - --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota
    - --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds
mtchuyen commented 7 years ago

i same problem

I use kubernetes 1.8:

***$kubeadm version***
kubeadm version: &version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.3", GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", BuildDate:"2017-11-08T18:27:48Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

my config: /etc/kubernetes/manifests/kube-apiserver.yaml

    - --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSec
onds,ResourceQuota

run on master-2, and master-3 still:

root@backend-023:~# kubectl get nodes
NAME                  STATUS    ROLES     AGE       VERSION
backend-023   Ready     master    1h        v1.8.2
mtchuyen commented 7 years ago

@cookeem @webeindustry could be reopen issue? thanks

cookeem commented 6 years ago

@mtchuyen can you show me the kubelet's log?

mtchuyen commented 6 years ago

thanks for reply!

Here kubelet's log:

backend-052 is master-2

Nov 13 14:34:33 backend-052 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Nov 13 14:34:33 backend-052 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 13 14:34:33 backend-052 kubelet[2024]: I1113 14:34:33.822758    2024 feature_gate.go:156] feature gates: map[]
Nov 13 14:34:33 backend-052 kubelet[2024]: I1113 14:34:33.822842    2024 controller.go:114] kubelet config controller: starting controller
Nov 13 14:34:33 backend-052 kubelet[2024]: I1113 14:34:33.822848    2024 controller.go:118] kubelet config controller: validating combination of defaults and flags
Nov 13 14:34:34 backend-052 kubelet[2024]: I1113 14:34:34.197458    2024 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Nov 13 14:34:34 backend-052 kubelet[2024]: I1113 14:34:34.197509    2024 client.go:95] Start docker client with request timeout=2m0s
Nov 13 14:34:34 backend-052 kubelet[2024]: W1113 14:34:34.198548    2024 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 13 14:34:34 backend-052 kubelet[2024]: I1113 14:34:34.204017    2024 feature_gate.go:156] feature gates: map[]
Nov 13 14:34:34 backend-052 kubelet[2024]: W1113 14:34:34.204248    2024 server.go:289] --cloud-provider=auto-detect is deprecated. The desired cloud provider should be set explicitly
Nov 13 14:34:34 backend-052 kubelet[2024]: I1113 14:34:34.230757    2024 certificate_manager.go:361] Requesting new certificat

I copy /etc/cni/net.d/10-flannel.conf from master-1 then restart kubelet:

*** systemctl status kubelet****

● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Mon 2017-11-13 14:45:58 +07; 7s ago
     Docs: http://kubernetes.io/docs/
 Main PID: 8997 (kubelet)
    Tasks: 11
   Memory: 12.2M
      CPU: 196ms
   CGroup: /system.slice/kubelet.service
           └─8997 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.

Nov 13 14:45:58 backend-052 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779333    8997 feature_gate.go:156] feature gates: map[]
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779398    8997 controller.go:114] kubelet config controller: startin
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779403    8997 controller.go:118] kubelet config controller: validat
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.794076    8997 client.go:75] Connecting to docker on unix:///var/run
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.794384    8997 client.go:95] Start docker client with request timeou
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.803127    8997 feature_gate.go:156] feature gates: map[]
Nov 13 14:45:58 backend-052 kubelet[8997]: W1113 14:45:58.803332    8997 server.go:289] --cloud-provider=auto-detect is deprec
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.834492    8997 certificate_manager.go:361] Requesting new certifi

and:

Nov 13 14:45:58 backend-052 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Nov 13 14:45:58 backend-052 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779333    8997 feature_gate.go:156] feature gates: map[]
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779398    8997 controller.go:114] kubelet config controller: starting controller
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.779403    8997 controller.go:118] kubelet config controller: validating combination of defaults and flags
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.794076    8997 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.794384    8997 client.go:95] Start docker client with request timeout=2m0s
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.803127    8997 feature_gate.go:156] feature gates: map[]
Nov 13 14:45:58 backend-052 kubelet[8997]: W1113 14:45:58.803332    8997 server.go:289] --cloud-provider=auto-detect is deprecated. The desired cloud provider should be set explicitly
Nov 13 14:45:58 backend-052 kubelet[8997]: I1113 14:45:58.834492    8997 certificate_manager.go:361] Requesting new certificate.
cookeem commented 6 years ago

Make sure your apiServerCertSANs and endpoints settings in kubeadm-init-v1.7.x.yaml is right, please show me your kubeadm-init-v1.7.x.yaml file, this file like below:

$ vi /root/kubeadm-ha/kubeadm-init-v1.7.x.yaml 
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
kubernetesVersion: v1.7.0
networking:
  podSubnet: 10.244.0.0/16
apiServerCertSANs:
- k8s-master1
- k8s-master2
- k8s-master3
- 192.168.60.71
- 192.168.60.72
- 192.168.60.73
- 192.168.60.80
etcd:
  endpoints:
  - http://192.168.60.71:2379
  - http://192.168.60.72:2379
  - http://192.168.60.73:2379
mtchuyen commented 6 years ago

Because I use kube V.1.8, so I rename file config

kubeadm version

kubeadm version: &version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.3", GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", BuildDate:"2017-11-08T18:27:48Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

cat kubeadm-init-v1.8.x.yaml

apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
kubernetesVersion: v1.8.0
networking:
podSubnet: 10.244.0.0/16
apiServerCertSANs:
- backend-023
- backend-052
- backend-055
- <ip_master1>
- <ip_master2>
- <ip_master3>
- <VIRTUAL_IP: ip_master2>
etcd:
endpoints:
- http://<ip_master1>:2379
- http://<ip_master2>:2379
- http://<ip_master3>:2379
mtchuyen commented 6 years ago

Here is my container (some are newer version: flanneld,kube-xxx):

docker images

REPOSITORY TAG IMAGE ID CREATED SIZE nginx latest 40960efd7b8f 8 days ago 108 MB gcr.io/google_containers/kube-apiserver-amd64 v1.8.2 6278a1092d08 2 weeks ago 194 MB gcr.io/google_containers/kube-controller-manager-amd64 v1.8.2 5eabb0eae58b 2 weeks ago 129 MB gcr.io/google_containers/kube-scheduler-amd64 v1.8.2 b48970f8473e 2 weeks ago 54.9 MB gcr.io/google_containers/kube-proxy-amd64 v1.8.2 88e2c85d3d02 2 weeks ago 93.1 MB gcr.io/google_containers/heapster-amd64 v1.4.3 6450eba57f23 5 weeks ago 73.4 MB gcr.io/google_containers/kubernetes-dashboard-amd64 v1.7.1 294879c6444e 5 weeks ago 128 MB gcr.io/google_containers/k8s-dns-sidecar-amd64 1.14.5 fed89e8b4248 6 weeks ago 41.8 MB gcr.io/google_containers/k8s-dns-kube-dns-amd64 1.14.5 512cd7425a73 6 weeks ago 49.4 MB gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64 1.14.5 459944ce8cc4 6 weeks ago 41.4 MB quay.io/coreos/flannel v0.9.0-amd64 4c600a64a18a 7 weeks ago 51.3 MB gcr.io/google_containers/heapster-influxdb-amd64 v1.3.3 577260d221db 2 months ago 12.5 MB gcr.io/google_containers/etcd-amd64 3.0.17 243830dae7dd 8 months ago 169 MB gcr.io/google_containers/heapster-grafana-amd64 v4.0.2 a1956d2a1a16 9 months ago 131 MB gcr.io/google_containers/pause-amd64 3.0 99e59f495ffa 18 months ago 747 kB



thanks.
mtchuyen commented 6 years ago

Check log in master-1:

kubectl logs -n kube-system kube-controller-manager-backend-023

E1113 12:42:41.061323       1 certificate_controller.go:139] Sync csr-6zgzk failed with : recognized csr "csr-6zgzk" as [nodeclient] but subject access review was not approved
E1113 12:43:56.948538       1 certificate_controller.go:139] Sync csr-7v85l failed with : recognized csr "csr-7v85l" as [nodeclient] but subject access review was not approved
E1113 12:46:13.572616       1 certificate_controller.go:139] Sync csr-4wgsj failed with : recognized csr "csr-4wgsj" as [nodeclient] but subject access review was not approved
E1113 12:48:05.949856       1 certificate_controller.go:139] Sync csr-f24wn failed with : recognized csr "csr-f24wn" as [nodeclient] but subject access review was not approved
E1113 12:49:24.632010       1 certificate_controller.go:139] Sync csr-7v85l failed with : recognized csr "csr-7v85l" as [nodeclient] but subject access review was not approved
E1113 12:53:36.424552       1 certificate_controller.go:139] Sync csr-6zgzk failed with : recognized csr "csr-6zgzk" as [nodeclient] but subject access review was not approved

kubectl logs -n kube-system kube-scheduler-backend-023

E1113 12:06:46.443935       1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Node: Get https://10.1.0.23:6443/api/v1/nodes?resourceVersion=0: dial tcp 10.1.0.23:6443: getsockopt: connection refused
E1113 12:06:46.444638       1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1beta1.StatefulSet: Get https://10.1.0.23:6443/apis/apps/v1beta1/statefulsets?resourceVersion=0: dial tcp 10.1.0.23:6443: getsockopt: connection refused
E1113 12:06:46.445821       1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1beta1.ReplicaSet: Get https://10.1.0.23:6443/apis/extensions/v1beta1/replicasets?resourceVersion=0: dial tcp 10.1.0.23:6443: getsockopt: connection refused
E1113 12:06:46.446870       1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.ReplicationController: Get https://10.1.0.23:6443/api/v1/replicationcontrollers?resourceVersion=0: dial tcp 10.1.0.23:6443: getsockopt: connection refused
mtchuyen commented 6 years ago

Hi @cookeem , the problem is livenessProbe in apiserver. see comment by pipejakob in https://github.com/kubernetes/kubeadm/issues/193

thanks.

cookeem commented 6 years ago

It seems that certs not match, so livenessProbe failed, the nodes can not joined the cluster.

E1113 12:42:41.061323       1 certificate_controller.go:139] Sync csr-6zgzk failed with : recognized csr "csr-6zgzk" as [nodeclient] but subject access review was not approved
E1113 12:43:56.948538       1 certificate_controller.go:139] Sync csr-7v85l failed with : recognized csr "csr-7v85l" as [nodeclient] but subject access review was not approved
E1113 12:46:13.572616       1 certificate_controller.go:139] Sync csr-4wgsj failed with : recognized csr "csr-4wgsj" as [nodeclient] but subject access review was not approved
E1113 12:48:05.949856       1 certificate_controller.go:139] Sync csr-f24wn failed with : recognized csr "csr-f24wn" as [nodeclient] but subject access review was not approved
E1113 12:49:24.632010       1 certificate_controller.go:139] Sync csr-7v85l failed with : recognized csr "csr-7v85l" as [nodeclient] but subject access review was not approved
E1113 12:53:36.424552       1 certificate_controller.go:139] Sync csr-6zgzk failed with : recognized csr "csr-6zgzk" as [nodeclient] but subject access review was not approved

Check this document: https://kubernetes.io/docs/admin/kubeadm/, make sure your certs create correctly.

mtchuyen commented 6 years ago

I use certificates was generated by kubeadm (auto) and nothing change from your guide,

cookeem commented 6 years ago

@mtchuyen If create certificates failed, you can try to create by manual.

Just comment apiServerCertSANs settings in kubeadm-init-v1.8.x.yaml file:

apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
kubernetesVersion: v1.8.0
networking:
  podSubnet: 10.244.0.0/16
#apiServerCertSANs:
#- backend-023
#- backend-052
#- backend-055
#- <ip_master1>
#- <ip_master2>
#- <ip_master3>
#- <VIRTUAL_IP: ip_master2>
etcd:
  endpoints:
  - http://<ip_master1>:2379
  - http://<ip_master2>:2379
  - http://<ip_master3>:2379

On all master nodes create certificates by manual:

  1. Create key file apiserver-manual.key:

    openssl genrsa -out apiserver-manual.key 2048
  2. Create csr file apiserver-manual.csr:

    openssl req -new -key apiserver-wudang.key -subj "/CN=kube-apiserver," -out apiserver-manual.csr
  3. Create ext file apiserver-manual.ext:

    vi apiserver-manual.ext
    subjectAltName = DNS:${CURRENT_HOSTNAME},DNS:kubernetes,DNS:kubernetes.default,DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP:${MASTER1_IP}, IP:${MASTER2_IP}, IP:${MASTER3_IP}, IP:${VIRTUAL_IP}
  4. Use /etc/kubernetes/pki/ca.crt to create certificates apiserver-manual.crt:

    openssl x509 -req -in apiserver-manual.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -CAcreateserial -out apiserver-manual.crt -days 365 -extfile apiserver-manual.ext
  5. Replace kube-apiserver.yaml settings:

    vi kube-apiserver.yaml
    - --tls-cert-file=${YOUR_PATH}/apiserver-manual.crt
    - --tls-private-key-file=${YOUR_PATH}/apiserver-manual.key
  6. Restart your cluster and check it:

    systemctl restart kubelet docker
mtchuyen commented 6 years ago

thank @cookeem !