Azure / azure-container-networking

Azure Container Networking Solutions for Linux and Windows Containers
MIT License
375 stars 241 forks source link

User "system:serviceaccount:kube-system:azure-cns" cannot list resource "clustersubnetstates" after update to 1.30.4 control plane #3063

Open petrkr opened 1 day ago

petrkr commented 1 day ago

Seems new azure-cns missing some roles/permissions. After update to Kubernetes 1.30.4 CNS is unable to authorize against API

W1010 10:54:07.200809       1 reflector.go:547] pkg/mod/k8s.io/client-go@v0.30.4/tools/cache/reflector.go:232: failed to list *v1alpha1.ClusterSubnetState: clustersubnetstates.acn.azure.com is forbidden: User "system:serviceaccount:kube-system:azure-cns" cannot list resource "clustersubnetstates" in API group "acn.azure.com" at the cluster scope

E1010 10:54:07.200850       1 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.4/tools/cache/reflector.go:232: Failed to watch *v1alpha1.ClusterSubnetState: failed to list *v1alpha1.ClusterSubnetState: clustersubnetstates.acn.azure.com is forbidden: User "system:serviceaccount:kube-system:azure-cns" cannot list resource "clustersubnetstates" in API group "acn.azure.com" at the cluster scope

As result there can not be assigned new IP address to PODs which causes this error

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b009a12325af4202a9094c60d3971fb6c562bfa975cb18845cdacc64ff527199": plugin type="azure-vnet" failed (add): IPAM Invoker Add failed with error: failed to add ipam invoker: Failed to get IP address from CNS: AllocateIPConfig failed: not enough IPs available for 82ca83b3-f5da-44f2-a766-2aefd70f192e, waiting on Azure CNS to allocate more with NC Status:

Maybe role binding is missing in https://github.com/Azure/azure-container-networking/blob/master/cns/azure-cns.yaml ?

As results is stuck cluster.

petrkr commented 1 day ago

As workaround I have to add this ClusterRole and it's binding.

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  namespace: kube-system
  name: nodeTempClusterSubnetByHand
rules:
- apiGroups: ["acn.azure.com"]
  resources: ["clustersubnetstates"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: nodeTempClusterSubnetByHandRoleBinding
  namespace: kube-system
subjects:
- kind: ServiceAccount
  name: azure-cns
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: nodeTempClusterSubnetByHand
  apiGroup: rbac.authorization.k8s.io
---
rbtr commented 13 hours ago

@petrkr you can delete the CRD to mitigate this kubectl delete crd -n kube-system clustersubnetstates CNS will log a slightly different error about the CRD not being found, but that one is benign and it will operate normally. This has been fixed in https://github.com/Azure/azure-container-networking/pull/3029 and the latest CNS 1.6.13 is rolling out to AKS imminently.