canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.41k stars 767 forks source link

Upgrading calico to 3.26 #4673

Open shundezhang opened 1 week ago

shundezhang commented 1 week ago

Summary

A bug is found in Calico 3.25 and is believed to be fixed in 3.26. Since microk8s 1.28/1.29/1.30 are still using calico 3.25, a user hit this bug after running microk8s 1.28 for 120 days.

What Should Happen Instead?

Microk8s 1.28 should bundle with calico 3.26. Also as suggested in calico page, microk8s 1.29 should bundle with calico 3.27 [1], and microk8s 1.30 should bundle with calico 3.28 [2].

[1] https://docs.tigera.io/calico/3.27/getting-started/kubernetes/requirements [2] https://docs.tigera.io/calico/latest/getting-started/kubernetes/requirements

Reproduction Steps

  1. Deploy microk8s
  2. Wait... It seems the issue happens after token expires. A user reported this issue happened after running for 120 days.

Introspection Report

Can you suggest a fix?

Upgrade calico to 3.26.5.

Are you interested in contributing with a fix?

Update /var/snap/microk8s/current/args/cni-network/cni.yaml file, change calico image version from 3.25.1 to 3.26.5, and add a new SA/role/rolebinding:

---
# Source: calico/templates/calico-node.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: calico-cni-plugin
  namespace: kube-system
---
# CNI cluster role 
# Source: calico/templates/calico-node-rbac.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: calico-cni-plugin
rules:
  - apiGroups: [""]
    resources:
      - pods
      - nodes
      - namespaces
    verbs:
      - get
  - apiGroups: [""]
    resources:
      - pods/status
    verbs:
      - patch
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - blockaffinities
      - ipamblocks
      - ipamhandles
      - clusterinformations
      - ippools
      - ipreservations
      - ipamconfigs
    verbs:
      - get
      - list
      - create
      - update
      - delete
---
# Source: calico/templates/calico-node-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: calico-cni-plugin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: calico-cni-plugin
subjects:
- kind: ServiceAccount
  name: calico-cni-plugin
  namespace: kube-system

Update an existing clusterrole

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: calico-node
rules:
  # Used for creating service account tokens to be used by the CNI plugin
  - apiGroups: [""]
    resources:
      - serviceaccounts/token
    resourceNames:
      - calico-cni-plugin <- update from calico-node
    verbs:
      - create

Apply the yaml file microk8s kubectl apply -f /var/snap/microk8s/current/args/cni-network/cni.yaml

pedrofragola commented 1 week ago

As per internal discussion, this change will happen in version 1.32 as per the PR https://github.com/canonical/microk8s/pull/4638

ClaudZen commented 1 week ago

Is there a quick way to trigger this bug in order to verify if upgrading the Calico version and updating permissions effectively resolves the issue?