foundation-model-stack / multi-nic-cni

https://foundation-model-stack.github.io/multi-nic-cni/
Apache License 2.0
34 stars 5 forks source link

PR for handling scalability issue #2

Closed sunya-ch closed 2 years ago

sunya-ch commented 2 years ago

I have mitigated two bottlenecks that might happen due to dynamic change from auto-scaling.

  1. one request per one route management (add/delete) overloaded the daemon port. solution:
    1. manage CIDR on host as a Table instead of a simple list of routes, so that only one single request is needed when CIDR changes or periodic update
    2. to reduce the request while keep synchronising for the case of node failure, separate synchronising job from the reconcile loop
  2. multiple resource get/list (for daemon pods, hostinterfaces, CIDR) for every time that (i) send request to daemon, (ii) update CIDR, etc. causes client-side throttling to API server. solution: cache and update daemonpod, hostinterface, and CIDR at watch point and use cache instead of calling API server.

notes: change version to 1.0.1-alpha

Signed-off-by: Sunyanan Choochotkaew sunyanan.choochotkaew1@ibm.com