Azure / karpenter-provider-azure

AKS Karpenter Provider
Apache License 2.0
308 stars 46 forks source link

feat: Add support for clusters with a custom CNI #380

Open moredatapls opened 1 month ago

moredatapls commented 1 month ago

Tell us about your request

We are using AKS with Calico Enterprise (networkPlugin: none). It would be great if this configuration was supported by the Karpenter provider for Azure, as we would love to use the provider in our clusters.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Bring your own CNI is not supported by the Karpenter provider for Azure. We tried deploying Karpenter with NETWORK_PLUGIN="", NETWORK_PLUGIN="calico", NETWORK_PLUGIN="none", and while Karpenter deploys nodes and schedules pods without issues, the pods have (unsurprisingly) no network connection.

Are you currently working around this issue?

By not using Karpenter, sadly.

Additional Context

I would happily contribute to this provider to get support for our use case if someone could point me in the right direction.

We tested this with version 0.4.0 of the provider.

Attachments

No response

Community Note

thiagorider commented 4 weeks ago

Hi @moredatapls , could you share your setup and controller logs, so we can reproduce?

moredatapls commented 3 weeks ago

Hi @thiagorider, I can't really share my setup, since it's a very big internal Pulumi stack that deploys our clusters. However, we basically just tested the default installation of Calico Enterprise together with Karpenter:

  1. Create an AKS cluster with bring your own CNI: https://learn.microsoft.com/en-us/azure/aks/use-byo-cni?tabs=azure-cli
  2. Install Calico Enterprise as described here: https://docs.tigera.io/calico-enterprise/latest/getting-started/install-on-clusters/aks
  3. Follow the self-hosted installation of Karpenter as described here (I tried NETWORK_PLUGIN="", NETWORK_PLUGIN="calico", NETWORK_PLUGIN="none" - same result in all cases): https://github.com/Azure/karpenter-provider-azure?tab=readme-ov-file#installation-self-hosted

Unfortunately, I don't have the controller logs anymore. However, the controller had no issues provisioning new nodes and did not print any errors. It detected that pods needed to be scheduled and provisioned new nodes accordingly. The container images were pulled just fine. However, once the pods got scheduled on the nodes, they did not have connectivity to the rest of the cluster, and did not have internet connection (as far as I remember).

I hope that helps. Sorry for not being able to share any code.

flbla commented 3 weeks ago

hi, same issue : #270