canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.51k stars 772 forks source link

cilium issue on multi-node cluster #2165

Closed joerocklin closed 8 months ago

joerocklin commented 3 years ago

I have a three node cluster and enable the cilium addon. With this, cilium appears to deploy as expected, but any pods which are scheduled on nodes other than the 'primary' node (where I ran microk8s enable cilium) has an error in the following form:

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d5ce4743fbb062e78d7bd418c7983ad28b72546a0d2469d9ba4b76cf2253bd19": error getting ClusterInformation: connection is unauthorized: Unauthorized

I'm not sure specifically what is happening, but it seems like some piece of the CNI configuration isn't getting adjusted on all of the nodes in the cluster. Here's the CNI dir from the primary node:

$ ls -l /var/snap/microk8s/current/args/cni-network/
total 60
-rw-r--r-- 1 root root       104 Apr 10 19:44 05-cilium-cni.conf
-rw-r--r-- 1 root root       674 Apr 10 18:32 10-calico.conflist
-rw------- 1 root root      2730 Apr 10 18:32 calico-kubeconfig
-rw-rw---- 1 root microk8s 21578 Apr  6 02:25 cni.yaml.backup
-rw-rw---- 1 root microk8s 21578 Apr  6 02:25 cni.yaml.disabled

Compared to one of the other nodes:

$ ls -l /var/snap/microk8s/current/args/cni-network/
total 56
-rw-r--r-- 1 root root       674 Apr 10 18:32 10-calico.conflist
-rw------- 1 root root      2730 Apr 10 18:32 calico-kubeconfig
-rw-rw---- 1 root microk8s 21578 Apr  6 02:38 cni.yaml
-rw-rw---- 1 root microk8s 21578 Apr  6 02:38 cni.yaml.backup

Is this the expected behavior? I can't find any information about cilium being supported or not on a multi-node microk8s cluster.

inspection-report-20210411_150216.tar.gz

ktsakalozos commented 3 years ago

Thank you for reporting this @joerocklin . I am afraid the cilium addon is not suited for multi-node clusters.

ktsakalozos commented 3 years ago

See also https://github.com/ubuntu/microk8s/issues/2164

joerocklin commented 3 years ago

I think I was able to get cilium deployed on the cluster with some manual steps:

  1. Cordon all but one 'primary' node
  2. microk8s enable cilium
  3. Remove the calico and cni.yaml files from /var/snap/microk8s/current/args/cni-network
  4. Restart microk8s on each node one at a time
  5. Uncordon all nodes

At this point, when I did a deployment, pods could spin up on all nodes. I have no idea how stable/maintainable this path is yet. Also, I'm not familiar enough with the microk8s code yet to know how to make these steps happen with the enable scripts on all nodes (any pointers there are welcome).

PRNDA commented 2 years ago

I think I was able to get cilium deployed on the cluster with some manual steps:

  1. Cordon all but one 'primary' node
  2. microk8s enable cilium
  3. Remove the calico and cni.yaml files from /var/snap/microk8s/current/args/cni-network
  4. Restart microk8s on each node one at a time
  5. Uncordon all nodes

At this point, when I did a deployment, pods could spin up on all nodes. I have no idea how stable/maintainable this path is yet. Also, I'm not familiar enough with the microk8s code yet to know how to make these steps happen with the enable scripts on all nodes (any pointers there are welcome).

But you can't join new node no more, right?

joerocklin commented 2 years ago

Sorry, I stopped using microk8s for the cilium work I was looking at and I can't recall what I was and was not able to do.

tfmark commented 2 years ago

Is this a limitation of Cilium, or a limitation of microk8s?

ktsakalozos commented 2 years ago

This is a limitation of the MicroK8s Cilium addon. We are actively looking for a maintainer of it. Anyone interested?

tfmark commented 2 years ago

Most of the K8s inner workings are a form of dark magic to me, I installed calicoctl and setup some ingress/outgress rules on our cluster instead and, so far, that appears to sufficient for what we were originally doing with cilium on a non-clustered setup. So think we'll stick with that :)

DANW999 commented 1 year ago

I spent a while setting up my master node and Cilium network polices only to find out that Cilium does not support a multi-node set up when trying to add a worker node. This is a big deal as clustering and HA is the whole point of K8s and at the point I am using a one-node cluster, I might as well just use Docker or Podman. A firewall is a first line of defence security so I would rather not have a setup without Network Policies. Are there any alternatives to setting Network Polices in Microk8s or are there plans to rectify this issue any time soon at all?

findmyname666 commented 1 year ago

Thank you for reporting this @joerocklin . I am afraid the cilium addon is not suited for multi-node clusters.

I'm just starting to work with cilium but willing to learn more. Could you provide more info why it isn't suitable for mulit-node clusters?

ktsakalozos commented 8 months ago

Multinode support is available on the the Cilium addon. Sorry for not closing this sooner. Please open a new issue if you have any problems. Thank you.