CentaurusInfra / mizar

Mizar – Experimental, High Scale and High Performance Cloud Network https://mizar.readthedocs.io
https://mizar.readthedocs.io
GNU General Public License v2.0
112 stars 50 forks source link

System coredns stuck at containerCreating due to handler endpoint_opr_on_endpoint_provisioned failed temporarily #639

Open Sindica opened 2 years ago

Sindica commented 2 years ago

What happened: In 2x2 local env, coredns for both TP stuck at ContainerCreating

Operator log:

27477:[2022-03-02 17:33:54,017] kopf.objects         [ERROR   ] [default/coredns-default-ip-172-30-0-14-78dd67d496-jzfpj-kube-system-system-eth0] Handler 'endpoint_opr_on_endpoint_provisioned' fail
ed temporarily: Temporary Error: Update_agent_substrate ep returned ERROR! Retrying as agent may have not yet been loaded.
...
41852:[2022-03-02 17:48:18,874] kopf.objects         [ERROR   ] [default/coredns-default-ip-172-30-0-14-78dd67d496-jzfpj-kube-system-system-eth0] Handler 'endpoint_opr_on_endpoint_provisioned' failed temporarily: Temporary Error: update ep returned ERROR! Retrying as agent may have not yet been loaded.
Sindica commented 2 years ago

Was able to reproduce (1/3):

  1. Start TP1 and TP2, wait for message "Waiting for Mizar CRDs to reach 'Provisioned' state"
  2. Start RP1 and RP2. TP2 has coredns pod stuck at ContainerCreating

kubelet log:

E0302 21:54:22.318200   21678 kuberuntime_sandbox.go:86] CreatePodSandbox for pod "coredns-default-ip-172-30-0-156-857d6d684d-kpjff_kube-system_system(e5a6ee6e-9996-41d1-a04d-a599cd207e10)" failed: rpc error: code = Unknown desc = failed to setup network for sandbox "47077e2217a9603945fb2886496c78ad100b47809a1d91d593d35fd262d8b868": netplugin failed but error parsing its diagnostic message "{\n    \"dns\": {}\n}{\n    \"code\": 999,\n    \"msg\": \"Link not found\"\n}": invalid character '{' after top-level value

Operator has above log.