Open slapcat opened 6 months ago
Additional Details
Pod errors on other nodes after enabling cilium:
Normal Scheduled 3m2s default-scheduler Successfully assigned default/web-cilium-5f668dd859-mm5d8 to juju-9c0265-microk8s-1
Warning FailedCreatePodSandBox 3m2s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5b702f9572b97ec1f34b3e43691f1f0c5422326a3bfe5a799a96d70f0f913ea9": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Normal SandboxChanged 3s (x15 over 3m1s) kubelet Pod sandbox changed, it will be killed and re-created.
Working node /var/snap/microk8s/current/args/cni-network/
contents:
05-cilium-cni.conf 10-calico.conflist calico-kubeconfig cni.yaml.disabled
Broken node /var/snap/microk8s/current/args/cni-network/
contents:
10-calico.conflist calico-kubeconfig cni.yaml cni.yaml.backup
I am getting the exact same errors as you, but weirdly not always. I have a daemonset that can be launched into the other node, but the regular deployments cannot.
I am also confused on why calico is mentioned when it should have been removed from the system (I guess?)
Did you find any solution? I am connecting my nodes via tailscale.
With calico it worked, but it was a bit flaky, which is why I'm trying out cilium.
Thanks
Summary
Enabling the cilium addon on an existing multinode cluster only works on the current node the command is run on. The change fails to take effect on other nodes leading to two issues:
The root cause seems to be that the cilium addon depends on the community addon being enabled, but this is not done automatically on the other nodes when enabling cilium. This leads to a situation where the other nodes are still configured for the calico CNI, but it does not exist.
What Should Happen Instead?
Cilium should be correctly configured on all nodes after running
microk8s enable cilium
.Reproduction Steps
I used juju when testing the issue:
You should now see a pod running on the microk8s/leader node, but pending on all others. You can also see that the contents of
/var/snap/microk8s/current/args/cni-network
on the microk8s nodes are different.Introspection Report
N/A
Can you suggest a fix?
There is currently a workaround where you copy the contents of
/var/snap/microk8s/current/args/cni-network
on the working node and transfer it to the other nodes. Thensnap restart microk8s
.If you are building the cluster from scratch, or moving from single node to multinode, you can also prepare new nodes by enabling community and cilium addons before running
add-node
.Are you interested in contributing with a fix?
@ktsakalozos This regards an issue I asked you about earlier this week.