怎么不考虑直接用master01或k8s其中的任一一台作为deploy，还得另外去弄台发布机

li-sen commented 3 years ago

如题，尝试用 master01 去做安装部署看来是不行

TimeBye commented 3 years ago

怎么不行？出现了什么问题

li-sen commented 3 years ago

直接部署docker 网络断了 ssh不到任何节点

li-sen commented 3 years ago

还有之前的离线包中 kubeadm 版本1.20.1跟 k8s组件1.20.2的对不上

TimeBye commented 3 years ago

直接部署docker 网络断了 ssh不到任何节点

能详细描述下执行了什么命令后网络断掉了吗？

TimeBye commented 3 years ago

还有之前的离线包中 kubeadm 版本1.20.1跟 k8s组件1.20.2的对不上

按脚本文档来安装，不存在对不上的问题，即使离线包里面有1.20.2版本的组件，但也是有1.20.1版本的组件的。

li-sen commented 3 years ago

等会我重试下

li-sen commented 3 years ago

直接部署docker 网络断了 ssh不到任何节点

能详细描述下执行了什么命令后网络断掉了吗？

应该是防火墙啥的

TimeBye commented 3 years ago

直接部署docker 网络断了 ssh不到任何节点

能详细描述下执行了什么命令后网络断掉了吗？

应该是防火墙啥的

是的，在预处理的时候会有一步关闭所有节点防火墙的操作

https://github.com/TimeBye/kubeadm-ha/blob/956203ec316dea4af790fffadd462d8ab09c3851/roles/prepare/base/tasks/centos.yml#L15

li-sen commented 3 years ago

今天试了一把，去装离线包都是 1.20.2的镜像包，然后 kubeadm 是 1.20.1，yum install kubeadm 装不了

li-sen commented 3 years ago

打的离线包的名字叫 kubeadm-ha-1.20.1 ...

li-sen commented 3 years ago

今天看你整新的了，我现在再试一把~

TimeBye commented 3 years ago

谢谢反馈，这里kubelet的yum包确实下载成了1.20.2的包了，我先查查原因。其他包都是1.20.1的

li-sen commented 3 years ago

你整好，我来测下~

TimeBye commented 3 years ago

已修复，谢谢协助测试

https://oss.choerodon.com.cn/kubeadm-ha/docker-ce-19.03.13-amd64.tar.gz
https://oss.choerodon.com.cn/kubeadm-ha/kubeadm-ha-1.20.1-amd64.tar

li-sen commented 3 years ago

好的，我虚机整下

li-sen commented 3 years ago

TASK [prepare/kubernetes : 安装 kubeadm kubelet kubectl] ***** fatal: [jy-master01]: FAILED! => {"changed": false, "msg": "No package matching 'kubeadm-1.20.2' found available, installed or updated", "rc": 126, "results": ["No package matching 'kubeadm-1.20.2' found available, installed or updated"]} fatal: [jy-worker02]: FAILED! => {"changed": false, "msg": "No package matching 'kubeadm-1.20.2' found available, installed or updated", "rc": 126, "results": ["No package matching 'kubeadm-1.20.2' found available, installed or updated"]} fatal: [jy-master02]: FAILED! => {"changed": false, "msg": "No package matching 'kubeadm-1.20.2' found available, installed or updated", "rc": 126, "results": ["No package matching 'kubeadm-1.20.2' found available, installed or updated"]} fatal: [jy-worker01]: FAILED! => {"changed": false, "msg": "No package matching 'kubeadm-1.20.2' found available, installed or updated", "rc": 126, "results": ["No package matching 'kubeadm-1.20.2' found available, installed or updated"]} fatal: [jy-master03]: FAILED! => {"changed": false, "msg": "No package matching 'kubeadm-1.20.2' found available, installed or updated", "rc": 126, "results": ["No package matching 'kubeadm-1.20.2' found available, installed or updated"]} fatal: [jy-worker03]: FAILED! => {"changed": false, "msg": "No package matching 'kubeadm-1.20.2' found available, installed or updated", "rc": 126, "results": ["No package matching 'kubeadm-1.20.2' found available, installed or updated"]}

li-sen commented 3 years ago

版本还是有问题

TimeBye commented 3 years ago

可以把配置文件分享出来一下吗？ variables.yaml和inventory.ini

li-sen commented 3 years ago

我的问题，忘了改回 1.20.1

li-sen commented 3 years ago

TASK [kube-master : 初始化第一个 master 节点] ** fatal: [jy-master01]: FAILED! => {"changed": true, "cmd": "kubeadm init --config=/etc/kubernetes/kubeadm-config.yaml", "delta": "0:04:26.325170", "end": "2021-01-22 00:03:50.075733", "msg": "non-zero return code", "rc": 1, "start": "2021-01-21 23:59:23.750563", "stderr": "error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[init] Using Kubernetes version: v1.20.1\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\n[certs] Using existing ca certificate authority\n[certs] Using existing apiserver certificate and key on disk\n[certs] Using existing apiserver-kubelet-client certificate and key on disk\n[certs] Using existing front-proxy-ca certificate authority\n[certs] Using existing front-proxy-client certificate and key on disk\n[certs] External etcd mode: Skipping etcd/ca certificate authority generation\n[certs] External etcd mode: Skipping etcd/server certificate generation\n[certs] External etcd mode: Skipping etcd/peer certificate generation\n[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation\n[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation\n[certs] Using the existing \"sa\" key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\n[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\n[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\n[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Starting the kubelet\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\n[kubelet-check] Initial timeout of 40s passed.\n\n\tUnfortunately, an error has occurred:\n\t\ttimed out waiting for the condition\n\n\tThis error is likely caused by:\n\t\t- The kubelet is not running\n\t\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\n\tIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t\t- 'systemctl status kubelet'\n\t\t- 'journalctl -xeu kubelet'\n\n\tAdditionally, a control plane component may have crashed or exited when started by the container runtime.\n\tTo troubleshoot, list all containers using your preferred container runtimes CLI.\n\n\tHere is one example how you may list all Kubernetes containers running in docker:\n\t\t- 'docker ps -a | grep kube | grep -v pause'\n\t\tOnce you have found the failing container, you can inspect its logs with:\n\t\t- 'docker logs CONTAINERID'", "stdout_lines": ["[init] Using Kubernetes version: v1.20.1", "[preflight] Running pre-flight checks", "[preflight] Pulling images required for setting up a Kubernetes cluster", "[preflight] This might take a minute or two, depending on the speed of your internet connection", "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'", "[certs] Using certificateDir folder \"/etc/kubernetes/pki\"", "[certs] Using existing ca certificate authority", "[certs] Using existing apiserver certificate and key on disk", "[certs] Using existing apiserver-kubelet-client certificate and key on disk", "[certs] Using existing front-proxy-ca certificate authority", "[certs] Using existing front-proxy-client certificate and key on disk", "[certs] External etcd mode: Skipping etcd/ca certificate authority generation", "[certs] External etcd mode: Skipping etcd/server certificate generation", "[certs] External etcd mode: Skipping etcd/peer certificate generation", "[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation", "[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation", "[certs] Using the existing \"sa\" key", "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"", "[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address", "[kubeconfig] Writing \"admin.conf\" kubeconfig file", "[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address", "[kubeconfig] Writing \"kubelet.conf\" kubeconfig file", "[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address", "[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file", "[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address", "[kubeconfig] Writing \"scheduler.conf\" kubeconfig file", "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[kubelet-start] Starting the kubelet", "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"", "[control-plane] Creating static Pod manifest for \"kube-apiserver\"", "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"", "[control-plane] Creating static Pod manifest for \"kube-scheduler\"", "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s", "[kubelet-check] Initial timeout of 40s passed.", "", "\tUnfortunately, an error has occurred:", "\t\ttimed out waiting for the condition", "", "\tThis error is likely caused by:", "\t\t- The kubelet is not running", "\t\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)", "", "\tIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:", "\t\t- 'systemctl status kubelet'", "\t\t- 'journalctl -xeu kubelet'", "", "\tAdditionally, a control plane component may have crashed or exited when started by the container runtime.", "\tTo troubleshoot, list all containers using your preferred container runtimes CLI.", "", "\tHere is one example how you may list all Kubernetes containers running in docker:", "\t\t- 'docker ps -a | grep kube | grep -v pause'", "\t\tOnce you have found the failing container, you can inspect its logs with:", "\t\t- 'docker logs CONTAINERID'"]}

li-sen commented 3 years ago

第二次执行就 master初始化出问题了

li-sen commented 3 years ago

TASK [kube-master : 确认 kubelet 已停止运行] ** ok: [jy-master01]

TASK [kube-master : 获取 master 节点需要拉取的镜像列表] ***** changed: [jy-master01]

TASK [kube-master : 初始化第一个 master 节点] ** fatal: [jy-master01]: FAILED! => {"changed": true, "cmd": "kubeadm init --config=/etc/kubernetes/kubeadm-config.yaml", "delta": "0:04:26.325170", "end": "2021-01-22 00:03:50.075733", "msg": "non-zero return code", "rc": 1, "start": "2021-01-21 23:59:23.750563", "stderr": "error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[init] Using Kubernetes version: v1.20.1\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\n[certs] Using existing ca certificate authority\n[certs] Using existing apiserver certificate and key on disk\n[certs] Using existing apiserver-kubelet-client certificate and key on disk\n[certs] Using existing front-proxy-ca certificate authority\n[certs] Using existing front-proxy-client certificate and key on disk\n[certs] External etcd mode: Skipping etcd/ca certificate authority generation\n[certs] External etcd mode: Skipping etcd/server certificate generation\n[certs] External etcd mode: Skipping etcd/peer certificate generation\n[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation\n[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation\n[certs] Using the existing \"sa\" key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\n[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\n[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\n[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Starting the kubelet\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\n[kubelet-check] Initial timeout of 40s passed.\n\n\tUnfortunately, an error has occurred:\n\t\ttimed out waiting for the condition\n\n\tThis error is likely caused by:\n\t\t- The kubelet is not running\n\t\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\n\tIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t\t- 'systemctl status kubelet'\n\t\t- 'journalctl -xeu kubelet'\n\n\tAdditionally, a control plane component may have crashed or exited when started by the container runtime.\n\tTo troubleshoot, list all containers using your preferred container runtimes CLI.\n\n\tHere is one example how you may list all Kubernetes containers running in docker:\n\t\t- 'docker ps -a | grep kube | grep -v pause'\n\t\tOnce you have found the failing container, you can inspect its logs with:\n\t\t- 'docker logs CONTAINERID'", "stdout_lines": ["[init] Using Kubernetes version: v1.20.1", "[preflight] Running pre-flight checks", "[preflight] Pulling images required for setting up a Kubernetes cluster", "[preflight] This might take a minute or two, depending on the speed of your internet connection", "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'", "[certs] Using certificateDir folder \"/etc/kubernetes/pki\"", "[certs] Using existing ca certificate authority", "[certs] Using existing apiserver certificate and key on disk", "[certs] Using existing apiserver-kubelet-client certificate and key on disk", "[certs] Using existing front-proxy-ca certificate authority", "[certs] Using existing front-proxy-client certificate and key on disk", "[certs] External etcd mode: Skipping etcd/ca certificate authority generation", "[certs] External etcd mode: Skipping etcd/server certificate generation", "[certs] External etcd mode: Skipping etcd/peer certificate generation", "[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation", "[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation", "[certs] Using the existing \"sa\" key", "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"", "[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address", "[kubeconfig] Writing \"admin.conf\" kubeconfig file", "[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address", "[kubeconfig] Writing \"kubelet.conf\" kubeconfig file", "[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address", "[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file", "[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address", "[kubeconfig] Writing \"scheduler.conf\" kubeconfig file", "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[kubelet-start] Starting the kubelet", "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"", "[control-plane] Creating static Pod manifest for \"kube-apiserver\"", "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"", "[control-plane] Creating static Pod manifest for \"kube-scheduler\"", "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s", "[kubelet-check] Initial timeout of 40s passed.", "", "\tUnfortunately, an error has occurred:", "\t\ttimed out waiting for the condition", "", "\tThis error is likely caused by:", "\t\t- The kubelet is not running", "\t\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)", "", "\tIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:", "\t\t- 'systemctl status kubelet'", "\t\t- 'journalctl -xeu kubelet'", "", "\tAdditionally, a control plane component may have crashed or exited when started by the container runtime.", "\tTo troubleshoot, list all containers using your preferred container runtimes CLI.", "", "\tHere is one example how you may list all Kubernetes containers running in docker:", "\t\t- 'docker ps -a | grep kube | grep -v pause'", "\t\tOnce you have found the failing container, you can inspect its logs with:", "\t\t- 'docker logs CONTAINERID'"]}

NO MORE HOSTS LEFT *****

PLAY RECAP ***** jy-master01 : ok=197 changed=93 unreachable=0 failed=1 skipped=47 rescued=0 ignored=0 jy-master02 : ok=113 changed=47 unreachable=0 failed=0 skipped=44 rescued=0 ignored=0 jy-master03 : ok=113 changed=47 unreachable=0 failed=0 skipped=44 rescued=0 ignored=0 jy-worker01 : ok=78 changed=31 unreachable=0 failed=0 skipped=31 rescued=0 ignored=0 jy-worker02 : ok=78 changed=31 unreachable=0 failed=0 skipped=31 rescued=0 ignored=0 jy-worker03 : ok=78 changed=31 unreachable=0 failed=0 skipped=31 rescued=0 ignored=0

li-sen commented 3 years ago

需要 reset cluster？

TimeBye commented 3 years ago

我这边测试并没有问题，可尝试reset cluster后重试

li-sen commented 3 years ago

kube_kubeadm_apiserver_extra_args: {--runtime-config=api/all=true} 这么加参数不行？

li-sen commented 3 years ago

[root@jy-master01 ~]# docker logs -f 125872cd24ce Error: bad flag syntax: ----runtime-config=api/all=true= 我看 api-server报错了

TimeBye commented 3 years ago

kube_kubeadm_apiserver_extra_args: 
  runtime-config: api/all=true

li-sen commented 3 years ago

加参数的示例也写下吧

TimeBye commented 3 years ago

欢迎提交pr。

li-sen commented 3 years ago

这个需求有考虑加进来吗？还需要一台独立的部署机器这点确实不太友好，技术上应该是完全可以实现支持的

TimeBye commented 3 years ago

是可以的，只是文档我为了更清晰才这么写的，当时也回复你了https://github.com/TimeBye/kubeadm-ha/issues/34#issuecomment-764608095

li-sen commented 3 years ago

把防火墙禁用去掉？

TimeBye commented 3 years ago

把防火墙禁用去掉？

嗯这里我添加一个变量去控制是否禁用防火墙

li-sen commented 3 years ago

嗯

TimeBye commented 3 years ago

功能已实现 https://github.com/TimeBye/kubeadm-ha/commit/d7e6481fb9df1e45cdd0e277858b6f7d3d79467a

TimeBye / kubeadm-ha

怎么不考虑直接用master01或k8s其中的任一一台作为deploy，还得另外去弄台发布机 #34