aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.64k stars 924 forks source link

Conversion webhook hardcoded at kube-system namespace #6818

Open marcofranssen opened 1 month ago

marcofranssen commented 1 month ago

Description

Observed Behavior:

When upgrading our karpenter to the v1.0.0 chart it fails at the conversion webhook which is targeting the kube-system namespace. Our karpenter is deployed in the karpenter namespace.

controller logs

{"level":"ERROR","time":"2024-08-21T09:02:09.801Z","logger":"controller","message":"k8s.io/client-go@v0.30.3/tools/cache/reflector.go:232: Failed to watch *v1.NodeCl │
│ aim: failed to list *v1.NodeClaim: conversion webhook for karpenter.sh/v1beta1, Kind=NodeClaim failed: Post \"https://karpenter.kube-system.svc:8443/conversion/karpenter.sh?timeout=30s\": service  │
│ \"karpenter\" not found","commit":"5bdf9c3"}

Expected Behavior:

The conversion webhook targets the Helm Release namespace.

Reproduction Steps (Please include YAML):

Install Karpenter in the karpenter namespace using the release prior to v1.0.0. Then upgrade karpenter to v1.0.0.

Versions:

marcofranssen commented 1 month ago

Turns out this is controller by

https://github.com/aws/karpenter-provider-aws/blob/main/charts/karpenter-crd/values.yaml#L5

This is something that should be made very clear in the documentation or migration guide.

It isn't mentioned here https://aws.amazon.com/blogs/containers/announcing-karpenter-1-0/

woehrl01 commented 1 month ago

@marcofranssen We just stumbled over this as well. At best, this could be by default the namespace of the helm chart deployment.

marcofranssen commented 1 month ago

yes indeed. that would even be better

denniszag commented 1 month ago

I encountered the same problem using ArgoCD. The karpenter-crd chart is installed correctly in the karpenter namespace. But the karpenter chart also installs the same CRDs (using ArgoCD) but wants to change it to kube-system namespace. An option to disable installing CRDs using the karpenter chart can help.

marcofranssen commented 1 month ago

I encountered the same problem using ArgoCD. The karpenter-crd chart is installed correctly in the karpenter namespace. But the karpenter chart also installs the same CRDs (using ArgoCD) but wants to change it to kube-system namespace. An option to disable installing CRDs using the karpenter chart can help.

You can already do so by --skip-crds.

woehrl01 commented 1 month ago

@marcofranssen using that value only works if you install the "karpenter-crd" helm chart, if you use the "normal" one, they are symlinked and replacement does not happen. (https://github.com/aws/karpenter-provider-aws/tree/28da0b96b6086679f75e656d31ac65bd7fca2bc0/charts/karpenter)

woehrl01 commented 1 month ago

Bildschirmfoto 2024-08-21 um 14 34 10

Looks like the modification is made with a post job hook, and the referenced image digest is not compatible with ARM

woehrl01 commented 1 month ago

Related to #6819 #6765

Vinaum8 commented 1 month ago

Related to #6544

marcofranssen commented 4 weeks ago

Seems to be resolved in #6827

jmdeal commented 4 weeks ago

There's some additional justification given here, but if you need to template the CRDs (required if installing Karpenter outside of kube-system or customizing the service), you will need to use the karpenter-crd chart. This is why the karpenter-crd chart is used in the v1 migration guide, with an alternative of manually patching the CRDs.

As far as what's missing from our docs, I do think we can be a little more explicit. We do instruct users to install the CRD chart as part of the installation guide, but without a justification of why I can understand why users would continue to just use the standard chart since it already includes the CRDs, just without templating.

booleanbetrayal commented 3 weeks ago

As far as what's missing from our docs, I do think we can be a little more explicit. We do instruct users to install the CRD chart as part of the installation guide, but without a justification of why I can understand why users would continue to just use the standard chart since it already includes the CRDs, just without templating.

@jmdeal - In our case, in an ArgoCD environment, it's not the lack of clear instructions or justifications for using the karpenter-crd chart ... it's that you cannot deploy the CRDs from the karpenter-crd at all over ArgoCD managed resources, because of .Release.Service / managed-by and release-name stamping. You get fun errors like:

Error: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "ec2nodeclasses.karpenter.k8s.aws" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "karpenter-crd"

So you're stuck with the situation of having to figure it out yourself and potentially dealing with the Application construction as we are discussing in #6847.

frimik commented 1 week ago

For now, "fixing" it with an override on top of the helm chart. Because the validation in the webhook is still useful for me on older Kubernetes versions.

From Tanka with ❤️.

{
          local conversionWebhookSpecMixin = {
            spec+: {
              conversion+: {
                webhook+: {
                  clientConfig+: {
                    service+: {
                      namespace: c.karpenter.namespace,
                    },
                  },
                },
              },
            },
          },

      karpenter: helm.template(releaseName, './vendor/charts/karpenter', {
          apiVersions: apiVersions,
          includeCrds: includeCrds,
          kubeVersion: kubeVersion,
          namespace: c.karpenter.namespace,
          noHooks: noHooks,
          values: values,
        }) + {
          // https://github.com/aws/karpenter-provider-aws/issues/6818
          'custom_resource_definition_ec_2nodeclasses.karpenter.k_8s.aws'+: conversionWebhookSpecMixin,
          'custom_resource_definition_nodeclaims.karpenter.sh'+: conversionWebhookSpecMixin,
          'custom_resource_definition_nodepools.karpenter.sh'+: conversionWebhookSpecMixin,
       },
}
frimik commented 1 week ago

Seems to be backported to v1.0.2. See https://github.com/aws/karpenter-provider-aws/pull/6855, https://github.com/aws/karpenter-provider-aws/pull/6849 and then there are possibly several other open issues at least partially related to this issue and fix. Right?