Mirantis / hmc

Apache License 2.0
10 stars 11 forks source link

Nodes bootstrapped via aws-hosted-cp get stuck with `node.cloudprovider.kubernetes.io/uninitialized` taint #290

Closed squizzi closed 7 hours ago

squizzi commented 1 week ago

While testing aws-hosted-cp for our e2e test work in #280 I encountered a situation where nodes that are deployed using the aws-hosted-cp template cannot use CCM resources as they get stuck with the following taint:

    taints:
    - effect: NoSchedule
      key: node.cloudprovider.kubernetes.io/uninitialized
      value: "true"

Relates to: https://github.com/kubernetes-sigs/cluster-api/issues/9858. https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/4618

Stripping the taint allows the test to progress.

squizzi commented 1 week ago

LoadBalancer won't get an external IP assigned either, stripping the taint has no effect on this, because CCM isn't running konnectivity won't start up with:

konnectivity-agent-l4m8m                   0/1     CreateContainerConfigError   0          88s
konnectivity-agent-v9lpb                   0/1     CreateContainerConfigError   0          88s

...

  Warning  Failed     2s (x5 over 28s)  kubelet            Error: host IP unknown; known addresses: []
squizzi commented 1 week ago

This seems related to CCM not running correctly on workload clusters due to some form of permissions issue, looking into it further.

squizzi commented 1 week ago

For whatever reason this is intermittent, when trying to debug this on another run everything went fine.

squizzi commented 5 days ago

I can reproduce this pretty well now, I'm going to repro it outside of CI and see if I can debug and get to the bottom of this.