MAAS Cilium RollingUpgrade 1.27 to 1.28 makes old machine deleted before pods in old-version node are scheduled in different node

make folders, .cluster-api/overrides/infrastructure-maas/v0.5.0 under v0.5.0 folder, create files cluster-template.yaml, infrastructure-components.yaml, metadata.yaml. They are from our repo. https://github.com/spectrocloud/cluster-api-provider-maas/blob/main/templates/cluster-template.yaml https://github.com/spectrocloud/cluster-api-provider-maas/blob/main/spectro/generated/core-global.yaml https://github.com/spectrocloud/cluster-api-provider-maas/blob/main/metadata.yaml Then,

kind create cluster
clusterctl init --infrastructure maas:v0.5.0 --bootstrap microk8s --control-plane microk8s

Then, kubectl apply manifest (pls replace variables): maas_microk8s_cilium_share.yaml.zip Then, in target cluster, install cilium:

helm install cilium cilium/cilium  \
    --namespace kube-system \
    --set cni.confPath=/var/snap/microk8s/current/args/cni-network \
    --set cni.binPath=/var/snap/microk8s/current/opt/cni/bin \
    --set daemon.runPath=/var/snap/microk8s/current/var/run/cilium \
    --set operator.replicas=1 \
    --set ipam.operator.clusterPoolIPv4PodCIDRList="10.1.0.0/16" \
    --set nodePort.enabled=true

Please execute clusterctl init --infrastructure maas:v0.5.0 --bootstrap microk8s --control-plane microk8s in target cluster after all pods in the first-launched are running. To triggered RollingUpgrade for CP nodes, I change 1.27.13 to 1.28.9 and 1.27 to 1.28 in - /capi-scripts/00-install-microk8s.sh '--channel 1.27/stable --classic' in preRunCommands in mcp I observed that New Node with 1.28 version join the cluster, then it forced cilium pod in old node deleted. Then that old machine is deleted. However, pods in that old machine are not scheduled to a different node yet.

e.g. when executing clusterctl init --infrastructure maas:v0.5.0 --bootstrap microk8s --control-plane microk8s, capi-microk8s-bootstrap-controller-manager, capi-microk8s-control-plane-controller-manager, and capi-controller-manager are on the node called 07. When RollingUpgrading, new machine (naming 08) is up and want to replace 07, these pods on 07 disappeared in the cluster since machine 07 is deleted before those pods are scheduled in different node. However, the deployment of those pods shows those pods are still READY 1/1. Then, ssh into machine 08, journalctl -u snap.microk8s.daemon-kubelite has an error below:

microk8s.daemon-kubelite[10219]: E0710 02:40:27.853000   10219 kubelet.go:2855] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

Environment: infrastructure-maas: v0.5.0 Kernel: 5.15.0-113-generic CAPI: v1.7.4 Microk8s Boostrap: v0.6.6 Microk8s Control Plane: v0.6.6 Container Runtime: containerd://1.6.28 OS: Ubuntu 22.04.3

canonical / cluster-api-control-plane-provider-microk8s

MAAS Cilium RollingUpgrade 1.27 to 1.28 makes old machine deleted before pods in old-version node are scheduled in different node #62