elastisys / compliantkubernetes-kubespray

Apache License 2.0
26 stars 7 forks source link

OpenStack Cloud provider init failure on new clusters v2.24.0 #350

Closed anders-elastisys closed 3 months ago

anders-elastisys commented 4 months ago

Describe the bug There seems to be issues when creating new v2.24.0 clusters on openstack cloud providers where the openstack pods start and taints the nodes before coredns can start putting them in a pending state, and causing the openstack pods to crash as they fail to resolve the openstack endpoint:

Cloud provider could not be initialized: could not init cloud provider "openstack": Post "https://<openstack-endpoint>": dial tcp: lookup <openstack-endpoint> on 10.233.0.3:53: write udp ...->10.233.0.3:53: write: operation not permitted

Related upstream Kubespray issue: https://github.com/kubernetes-sigs/kubespray/issues/10914

To Reproduce Steps to reproduce the behavior:

  1. On a openstack cloud, create a cluster with v2.24.0, Kubespray will finish without errors
  2. Check kube-system namespace, see openstack pods crashing with logs similar to the output above

Expected behavior Creating new clusters with kubespray should work fine on all cloud providers.

Version (add all relevant versions):

Additional context

A workaround for now is to add tolerations to the coredns pods. E.g. create a file tolerations.yaml:

# tolerations.yaml
spec:
  template:
    spec:
      tolerations:
      - effect: NoSchedule
        key: node.cloudprovider.kubernetes.io/uninitialized
        value: "true"
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane

And patch coredns with the tolerations in the file:

kubectl patch deployment coredns -n kube-system --patch "$(cat tolerations.yaml)"

Once the openstack pods run without crashing you can remove the node.cloudprovider.kubernetes.io/uninitialized taint.