aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.75k stars 951 forks source link

defaulting.webhook.karpenter.sh not found #3625

Closed kovaxur closed 1 year ago

kovaxur commented 1 year ago

Version

Karpenter Version: v0.27.0 Kubernetes Version: v1.23.0

Expected Behavior

Karpenter should be able to remove nodes when not needed anymore.

Actual Behavior

Unable to remove nodes, reconcile error shown periodically.

Steps to Reproduce the Problem

Install via the 'CAS migration guide'.

Resource Specs and Logs

2023-03-21T15:28:04.207Z    ERROR   webhook.DefaultingWebhook   Reconcile error {"commit": "dc3af1a", "knative.dev/traceid": "eb59c790-9fa3-4fe7-84ec-8a6f3031677f", "knative.dev/key": "karpenter/cluster1-scaling-karpenter-cert", "duration": "151.372µs", "error": "error retrieving webhook: mutatingwebhookconfiguration.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\" not found"}

It seems that the mutating webhook does not exist:

kubectl get mutatingwebhookconfiguration.admissionregistration.k8s.io
NAME                                   WEBHOOKS   AGE
defaulting.webhook.karpenter.k8s.aws   1          5h34m
pod-identity-webhook                   1          6h50m
vpc-resource-mutating-webhook          1          6h50m

Community Note

njtran commented 1 year ago

hey @kovaxur, can you share the helm command you used? Sometimes the installation process fails because of some unset variable names. In addition, did you migrate from CAS or are doing some install on a fresh cluster with this guide?

kovaxur commented 1 year ago

Hi @njtran, it's a fresh install, just I'm familiar with CAS and it was easier to follow the guide.

I'm not using helm directly, I use pulumi to create the whole infra and this is also part of it, these are the parameters passed, basically same as with helm:

values: {
      settings: {
        aws: {
          clusterName: options.eksCluster.name,
          clusterEndpoint: options.eksCluster.eksCluster.endpoint,
          tags: {
            [`kubernetes.io/cluster/${options.eksCluster.name}`]: 'owned',
          },
          defaultInstanceProfile: options.defaultInstanceProfile.name,
        },
      },
      serviceAccount: {
        annotations: {
          'eks.amazonaws.com/role-arn': `arn:aws:iam::${getStackConfig().accountId}:role/${prefixWithStackName(options.eksCluster.name)}-karpenter-controller-role`,
        },
        name: 'karpenter',
      },
      nodeSelector: {
        'runSystem': 'yes',
      },
      tolerations: [{
        effect: 'NoSchedule',
        key: 'dedicated',
        operator: 'Equal',
        value: 'system',
      }],
    },

I tried to look into the code, but I don't get where the defaulting.webhook.karpenter.sh webhook comes from.

njtran commented 1 year ago

This is where the webhook is defined for your installation, which includes the webhook that it says you don't have: https://github.com/aws/karpenter/blob/v0.27.0/charts/karpenter/templates/webhooks-core.yaml

It's odd that this isn't in your cluster. I just tried a new fresh install for v0.27.0 and see the webhook in my cluster.

Additionally, I used the helm template command here and see the webhook in the generated file.

I'm not familiar with pulumi, so is it possible you removed this webhook at one point? We've also removed this webhook at HEAD, is your pulumi somehow pointed to HEAD? Maybe it's deleting the webhook because it doesn't see it at the latest commit?

kovaxur commented 1 year ago

Ah yes, the pulumi helm plugin does not support OCI registries, so I had to pull the project locally and it seems, that I have to manually checkout the proper branch.

It's working now, thanks! closing the issue.