CentaurusInfra / mizar

Mizar – Experimental, High Scale and High Performance Cloud Network https://mizar.readthedocs.io
https://mizar.readthedocs.io
GNU General Public License v2.0
112 stars 50 forks source link

Instable and restarting pods coredns-* and local-path-provisioner-* after deployment #545

Closed Hong-Chang closed 2 years ago

Hong-Chang commented 2 years ago

Issue: I observed such issue from time to time: after mizar deployed (in kind environment), sometimes the two coredns pods is both running, sometimes there is one coredns pod running, and sometimes both coredns pods are failing and restarting. And for pod local-path-provisioner, in most of time, it's failing and restarting.

Investigating:

  1. When do kubectl logs mizar-operator-*, I see following error: [2021-10-01 18:06:30,481] kopf.objects [ERROR ] [default/net0-b-2ee81836-9b60-4e7e-838a-5a4c0c9a45fb] Handler 'bouncer_opr_on_bouncer_init' failed temporarily: Temporary Error: Task: EndpointOperator Endpoint: coredns-66bff467f8-z2gwm-kube-system--eth0 Droplet Object not ready. It is temp error, and later I see the task was succeed after droplet is ready. Although task is succeed, it seems it's not really working. And the pods are failing and restarting.

Another findings is that: restarting is not working. But if do kubectl delete pod for the failing pod, and the pod will be created with a new random name. In this way, the pod is running without issue.

So looks me this is a timing issue happened when deploy, and makes the pods instable.