aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.62k stars 922 forks source link

Clarify v1 migration #6808

Open morremeyer opened 4 weeks ago

morremeyer commented 4 weeks ago

Description

How can the docs be improved?

The v1 migration documentation explains in great detail how to manually perform all the upgrade steps.

However, we use ArgoCD to deploy karpenter and the karpenter CRDs, and terraform to manage the IAM policies. Therefore and due to the needed repeatability, performing all these steps manually is not an option for us.

I read through the upgrade guide and came to the following conclusion for any setup that is not managed/upgraded manually on a terminal:

The needed steps when not doing everything on a terminal would therefore be:

  1. Perform needed steps in https://karpenter.sh/docs/upgrading/v1-migration/#changes-required-before-upgrading-to-v100
  2. Update the IAM role used by karpenter (step 8 in the upgrade procedure)
  3. Merge the PR that contains the version upgrades (this depends on how you update your versions, we use PRs for that)
  4. Update the CRDs
  5. Update karpenter

Can someone please confirm this or correct me where I'm wrong? Thanks!

Notes

jetersen commented 4 weeks ago

There are some lessons learned here from us, early adopters. Here be dragons :sweat: https://github.com/aws/karpenter-provider-aws/issues/6765

Vinaum8 commented 3 weeks ago

@morremeyer I am following the issues regarding the hook to then test karpenter again and I am also checking if I can change the way I install the CRD.

I use ArgoCD and everything went wrong here hahahaha but, my mistake.

booleanbetrayal commented 2 weeks ago

There is more discussion around this issue @ #6847

TL;DR - We ended up having to re-package the karpenter Chart (minus crds/) and include karpenter-crd chart in our ArgoCD Application. We also hit issues with Validating and Mutating webhooks post-install, presumably because we had to selectively sync the CRDs manually (and ArgoCD won't run hooks on selective syncs) and there was maybe a post-sync hook to clean these up. Regardless, it was a very trying upgrade. Good luck!

tvandinther commented 1 week ago

Why is the procedure not just:

At a later time of convenience...

In the future...

I thought this was the whole reason we versioned APIs in the first place.

Has anyone managed to do this upgrade through Argo CD without making custom charts, running conversion webhooks, reinstalling Karpenter or breaking and redeploying their clusters? I'd love to know. Until then, this upgrade is going to the backlog, v0.36 has been working well enough.

adrianmiron commented 1 week ago

@tvandinther Yeah, this looks like a mess for now.

Using ArgoCD, I managed to get 0.37.2 installed after switching from main chart crds ( disable with skip crd helm option ) to using the karpenter crd chart.

But when going from 0.37.2 to 1.0.1 .... it did not work.

I will be back in a few months when this is all sorted

Vinaum8 commented 1 week ago

image

clearly a problem, even using 2 helm charts. the helm chart "karpenter": it is not possible to change the service name. the helm chart "karpenter-crd": it has the correct name, but has a conflict with helm chart karpenter.

The configurations are in conflict.