CentaurusInfra / mizar

Mizar – Experimental, High Scale and High Performance Cloud Network https://mizar.readthedocs.io
https://mizar.readthedocs.io
GNU General Public License v2.0
112 stars 50 forks source link

After ConsumingInterface, pod-interface map shall delay pop until pod is running #628

Closed Hong-Chang closed 2 years ago

Hong-Chang commented 2 years ago

This is a code sequence logic issue.

  1. ProducingInterface only do once, and then put produced interface into a map.
  2. CNI plugin call ConsumeInterface.
  3. After interface is consumed, the interface will be pop out of map.
  4. Then CNI plugin do other setup based on returned interface.

The problem is that step 4 may be failed. At this time, since in step 3, the produced interface has been poped out, there will never be a second chance for system to retry and bring interface online.

We need to figure out a way to only pop interface from the map after the interface is on.

The issue has more chance to happen in the beginning of cluster and mizar deployed. At that time the system is not fully up and step 4 may be failed. Above described issue was detected by mizarcni.log: I0219 04:25:28.371432 765815 mizarcni.go:60] CNI_ADD: Tracelog: 'CNI_ADD: Args: '{"Command":"ADD","ContainerID":"6cc5602a1f56984a38da9f19cf0e71b799b060b3b6315ce9576dafd55dffd52f","NetNS":"/var/run/netns/cni-49 27cbf2-3df6-30f3-1770-44e974c7f9d8","IfName":"eth0","CniPath":"/opt/cni/bin","K8sPodNamespace":"kube-system","K8sPodName":"coredns-default-ip-172-31-21-250-5759fc5b8b-vfzsg","K8sPodTenant":"system","CniVersion" :"0.3.1","NetworkName":"mizarcni","Plugin":"mizarcni"}' CNI_ADD: Activating interface: 'interface_id:{pod_id:{k8s_pod_name:"coredns-default-ip-172-31-21-250-5759fc5b8b-vfzsg" k8s_namespace:"kube-system" k8s_pod_tenant:"system"} interface:"eth0"} veth:{name:"eth-9b92 2eab" peer:"veth-9b922eab"} address:{version:"4" ip_address:"228.240.0.2" ip_prefix:"16" gateway_ip:"228.240.0.1" mac:"ca:87:f0:a1:68:af" tunnel_id:"802685"} droplet:{version:"4" ip_address:"172.31.25.12" mac:" 02:79:32:72:9f:f1"} bouncers:{version:"4" ip_address:"172.31.21.250" mac:"02:0d:b5:d0:98:cf"} status:consumed pod_label_value:"1" namespace_label_value:"0" egress_bandwidth_bytes_per_sec:"0" pod_network_priorit y:"High" pod_network_class:"Premium" subnet_ip:"228.240.0.0" subnet_prefix:"16"' CNI_ADD: Activate interface result: '[Move interface 'eth-9b922eab/667' to netns 'cni-4927cbf2-3df6-30f3-1770-44e974c7f9d8'] + [Rename interface 'eth-9b922eab' to 'eth0'] + [Retrieve loopback interface] + [Set loopback interface UP] + [Set interface 'eth0' UP] + [Set ip addr '228.240.0.2' on interface 'eth0'] + [Set gateway '228.240.0.1' for interface 'eth0']' ' E0219 04:25:28.371470 765815 mizarcni.go:63] CNI_ADD: Error: 'network is unreachable' I0219 04:25:28.371509 765815 mizarcni.go:68] CNI_ADD: <<<<