cookeem / kubeadm-ha

通过kubeadm安装kubernetes高可用集群,使用docker/containerd容器运行时,适用v1.24.x以上版本
MIT License
679 stars 275 forks source link

calico authentication error to access apiserver #21

Closed iamaugustin closed 6 years ago

iamaugustin commented 6 years ago

Hi sir, I just try your newest updates based on canal. However, I just got stuck at deployment of canal. I find calico-node try to access 10.96.0.1:443 ( I think that is apiserver). Then, I see the apiserver prompt lots of authentication error like "Unable to authenticate the request due to an error". I tried to delete all secrets to make the secrets re-generated, but it doesn't work. Have you suffered from the same trouble or had any experiences to handle this?

Besides, I also want to ask you a question about clean removal of kubernetes. Actually, I tried "kubeadm reset" all the things (I have drained and deleted node first) and then follow your commands to remove the files. However, I still find calico pods initialized after kubeadm init automatically. Do you have any ideas about this?

Thanks, Augustin

cookeem commented 6 years ago

Remove this settings - --apiserver-count=3, this setting is not necessary, and restart docker and kubelet, is it works?

iamaugustin commented 6 years ago

I don't find your script sets --apiserver-count=3. The default value is 1? Where should I set it obviously?

iamaugustin commented 6 years ago

OK! I got what you said. You indicate this configuration in go-ha section. However, I find this trouble in the first section. In other words, I got troubles in init master01

BTW, I could ignore this step "--apiserver-count=3" in the later section?

cookeem commented 6 years ago

Which step you found this error? After deploy canal or after restart kubelet? This canal deployment use kubernetes api as datastore, later I will try use etcd as datastore to test it.

iamaugustin commented 6 years ago

I find this error after deploy canal. Just now, I bring canal up. Here is what I done for this trouble.

  1. delete all pods
  2. kubectl drain node..
  3. kubectl delete node..
  4. kubeadm reset
  5. reset etcd, I mean including etcd1, etcd2, etcd3
  6. keep using 10.244.0.0/16
  7. replay the jobs you mention in the readme (actually, I check each log at every step)
  8. Done.

Thanks for your kindly reply and nice sharing.

I will try to contribute something here, if I could :)

cookeem commented 6 years ago

@iamaugustin Actually this commands will reset all kubernetes settings and clean all etcd data

# reset kubernetes cluster
$ kubeadm reset

# clear etcd cluster data
$ rm -rf /var/lib/etcd-cluster

# reset and start etcd cluster
$ docker-compose --file etcd/docker-compose.yaml stop
$ docker-compose --file etcd/docker-compose.yaml rm -f
$ docker-compose --file etcd/docker-compose.yaml up -d