cybozu-go / coil

CNI plugin for Kubernetes designed for scalability and extensibility
Apache License 2.0
158 stars 18 forks source link

Egress: There is downtime on startup new NAT clients #287

Closed terassyi closed 1 month ago

terassyi commented 1 month ago

What

When we create a new NAT client, there is slight downtime because of the time lag in configuring a link and a route between the client and the NAT pods.

Downtime will be caused by following flow.

  1. create a new NAT client pod
  2. CNI add is called and coild on the scheduled node configure the pod’s address and routes.
  3. pod_watcher on NAT pod also configure a FoU link for new client but not configure the route.
  4. finish the configuration on client side and start to send the traffic.
  5. Traffic will be dropped because NAT pod doesn’t finish configuring yet!
  6. NAT pod also finish configuring the route to client.

How

To minimize downtime, we change to set FoU link up just before adding a route.

Checklist