canonical / cluster-api-control-plane-provider-microk8s

This project offers a cluster API control plane controller that manages the control plane of a MicroK8s cluster. It is expected to be used along with the respective MicroK8s specific machine bootstrap provider.
https://microk8s.io
7 stars 5 forks source link

Cilium v1.15.3 connectivity failed for Multi-CP Microk8s v1.27 in AWS #61

Closed Kun483 closed 2 months ago

Kun483 commented 3 months ago

I used kind cluster as bootstrap cluster and ClusterCtl to launch an AWS cluster with 3 CP nodes and 1 Worker node. I specified providers below in the .cluster_api/clusterctl.yaml.

providers:
  - name: "microk8s"
    url: "https://github.com/canonical/cluster-api-bootstrap-provider-microk8s/releases/v0.6.6/bootstrap-components.yaml"
    type: "BootstrapProvider"
  - name: "microk8s"
    url: "https://github.com/canonical/cluster-api-control-plane-provider-microk8s/releases/v0.6.6/control-plane-components.yaml"
    type: "ControlPlaneProvider"

Basically following this guide: https://cluster-api.sigs.k8s.io/user/quick-start.html to export variables. Then

clusterctl init --bootstrap microk8s --control-plane microk8s -i aws

Then I applied the files attached. aws_multi-cp_cilium _yamls_sharable.zip Next, I apply yaml file in the workload cluster to test Cilium connectivity.

kubectl create ns cilium-test
kubectl apply -n cilium-test -f https://raw.githubusercontent.com/cilium/cilium/1.15.3/examples/kubernetes/connectivity-check/connectivity-check.yaml

I observed multiple pods are in CrashLoopBackOff state.

➜  ~ k get po -n test
NAME                                                     READY   STATUS             RESTARTS         AGE
echo-a-6585f5d5fc-hp498                                  1/1     Running            0                85m
pod-to-external-1111-84cb665f46-zmkff                    1/1     Running            0                85m
echo-b-5b5bfbc975-vq49s                                  1/1     Running            0                85m
echo-b-host-5c4fd65696-6l28d                             1/1     Running            0                85m
pod-to-a-denied-cnp-6b57948c44-5x8f4                     1/1     Running            0                85m
pod-to-a-allowed-cnp-64f6687dfd-6n55g                    0/1     CrashLoopBackOff   27 (2m21s ago)   85m
pod-to-b-multi-node-clusterip-69588b6dd8-hg5g2           0/1     CrashLoopBackOff   27 (119s ago)    85m
host-to-b-multi-node-clusterip-7bc6485898-4sdz5          0/1     CrashLoopBackOff   27 (117s ago)    85m
pod-to-b-multi-node-headless-755d64c9b6-w6g4g            0/1     CrashLoopBackOff   27 (108s ago)    85m
host-to-b-multi-node-headless-64fdb84dbb-cs4l6           0/1     CrashLoopBackOff   27 (106s ago)    85m
pod-to-b-intra-node-nodeport-76c6964bf4-r92nn            0/1     CrashLoopBackOff   27 (105s ago)    85m
pod-to-b-multi-node-nodeport-98d4c4894-rlvhp             0/1     CrashLoopBackOff   27 (96s ago)     85m
pod-to-external-fqdn-allow-google-cnp-779cff995b-jbfpd   0/1     CrashLoopBackOff   27 (90s ago)     85m
pod-to-a-55b5f79c9b-kmz94                                0/1     Running            29 (33s ago)     85m

Environment: CAPA: v1.5.2 CAPI: v1.3.2 Microk8s Boostrap: v0.6.6 Microk8s Control Plane: v0.6.6 Kernel version: 6.2.0-1009-aws Container Runtime: containerd://1.6.28 OS: Ubuntu 22.04.3

eaudetcobello commented 2 months ago

This is resolved by adding port 8472/UDP to the node security group in AWS. Anyone experiencing these kinds of error in the future should look into their firewall rules and make sure the necessary ports are open (see https://docs.cilium.io/en/stable/operations/system_requirements/#firewall-rules).

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
  name: ${CLUSTER_NAME}
spec:
  network:
    vpc:
      availabilityZoneUsageLimit: 1
    cni:
      cniIngressRules:
      - description: vxlan-overlay
        fromPort: 8472
        protocol: udp
        toPort: 8472