TimeBye / kubeadm-ha

kubeadm-ha 使用 kubeadm 进行高可用 kubernetes 集群搭建,利用 ansible-playbook 实现自动化安装,既提供一键安装脚本,也可以根据 playbook 分步执行安装各个组件。
Other
610 stars 298 forks source link

81-add-worker后,原来的老节点不能使用TCP协议访问新节点POD #150

Closed wurenny closed 4 months ago

wurenny commented 6 months ago

缺陷描述

使用81-add-worker扩容两个节点后:

初步排查结果

环境 (请填写以下信息):

执行下面括号中的命令,提交返回结果

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"


- **Ansible版本** (`ansible --version`):
```shell
ansible 2.7.5
  config file = /home/tempuser/install/kubeadm-ha/ansible.cfg
  configured module search path = ['/home/tempuser/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.6.8 (default, Nov 16 2020, 16:55:22) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]

$ git log -1 commit 1fa962253cb50d55597ac041618ecc17fe6d9fc7 Author: ChongmingDu decodedcm@gmail.com Date: Sat Jul 31 01:07:16 2021 +0800

fs.inotify values were added to sysctl

- **目标kube版本**
```shell
# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.12", GitCommit:"e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725", GitTreeState:"clean", BuildDate:"2020-05-06T05:17:59Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.12", GitCommit:"e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725", GitTreeState:"clean", BuildDate:"2020-05-06T05:09:48Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}

Server: Docker Engine - Community Engine: Version: 20.10.7 API version: 1.41 (minimum version 1.12) Go version: go1.13.15 Git commit: b0f5bc3 Built: Wed Jun 2 11:56:35 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.6 GitCommit: d71fcd7d8303cbf684402823e425e9dd2e99285d runc: Version: 1.0.0-rc95 GitCommit: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7 docker-init: Version: 0.19.0 GitCommit: de40ad0


- **目标flannel版本**
```shell
# kubectl -n kube-system get ds kube-flannel-ds -o jsonpath='{range .spec.template.spec}{.containers[].image}{"\n"}{.initContainers[].image}{"\n"}{end}'
registry.aliyuncs.com/kubeadm-ha/coreos_flannel:v0.12.0
registry.aliyuncs.com/kubeadm-ha/coreos_flannel:v0.12.0

如何复现

复现的步骤:

  1. 在原有的inventory基础上,向[all] [kube-worker] [new-worker]中增加新的两个节点
  2. 执行部署命令,命令如下
    ansible-playbook -i inventory-test.ini -e @variables.yaml 81-add-worker.yml
  3. 两个集群扩容后,可以100%复现相同的问题
  4. 出现错误:扩容过程无报错

其他事项

问题有点古怪,没找到vxlan未按即定路由转发tcp至cni bridge的原因

wurenny commented 4 months ago

Done