foundation-model-stack / multi-nic-cni

https://foundation-model-stack.github.io/multi-nic-cni/
Apache License 2.0
33 stars 5 forks source link

Fix bugs causing unsync IPPool #27

Closed sunya-ch closed 1 year ago

sunya-ch commented 1 year ago

This PR fixes the bug where IPPool is not sync with the CIDR.

The issue is a cascading effect from HostInterface cache is not-yet set at CIDR updates when restarting controller after MultiNicNetwork has been already deployed. As a result, the previous CIDR entries are all removed as HostInterface become empty and recomputed. The order of PodCIDR of host-interface is FIFO and can be different from the previous assignment. The new PodCIDR can be unexpected assigned to the node that is already running.

What makes thing worse is that the previously-assigned IPPool is not removed. When daemon assigns the IP, it will pick the first-found IPPool that match that host.

To fix the issue, I add two significant changes.

  1. define SyncAllPendingCustomCR to initialize the cache of operator-managed custom resource and sync IPPool to the corresponding CIDR. This function is called once the reconcile loop of Config CR which will be run after manager started. The function UpdateCIDR will be skipped until this function is called.
  2. Update logic to CleanPendingIPPools. Instead of referring to the old CIDR (when update), clean IPPool based on the current CIDR.

Side update: use IPPoolCache instead of call List API.

Signed-off-by: Sunyanan Choochotkaew sunyanan.choochotkaew1@ibm.com