aws-quickstart / cdk-eks-blueprints

AWS Quick Start Team
Apache License 2.0
446 stars 198 forks source link

Karpenter Addon: On upgrade to 0.33+, new CRDs are not installed by blueprints #962

Open jsamuel1 opened 6 months ago

jsamuel1 commented 6 months ago

Describe the bug

The upgrade from Alpha to Beta CRDs requires new CRDs to be installed into the cluster ahead of the helm addon. These CRDs should be installed if they don't yet exist as a pre-install step ahead of the helm chart, when installing versions >0.32. Currently - cluster upgrade fails with Helm failing:

UPDATE_FAILED        | Custom::AWSCDK-EKS-HelmChart          | <stackname>/chart-karpenter/Resource/Default (<biglongstackname>chartkarpenter<hash>) Received response status [FAILED] from custom resource. Message returned: Error: b'Error: UPGRADE FAILED: client rate limiter Wait returned an error: context deadline exceeded\n' 

Expected Behavior

I expected the blueprints to handle installing the new CRDs. For new installs, the easiest would be to use the CRD helm chart of the same version. Note - this won't work if upgrading clusters that have manually added/worked around the CRD issue, due to helm dogmatically avoiding the issue of retrofitting helm over existing resources. Otherwise, just install the 3x manifests, as per: https://karpenter.sh/docs/upgrading/upgrade-guide/#crd-upgrades

Current Behavior

Upgrade fails and rolls back. (And if a cluster upgrade at the same time, rollback fails.)

Reproduction Steps

Install an EKS blueprint using December version + Karpenter 0.31.x Upgrade cluster + Karpenter to 0.34.x

Possible Solution

https://karpenter.sh/docs/upgrading/upgrade-guide/#crd-upgrades

Add the CRD's to the addon code.

Additional Information/Context

No response

CDK CLI Version

2.131.0

EKS Blueprints Version

1.14.0

Node.js Version

20.11.0

Environment details (OS name and version, etc.)

gitlab ci runners

Other information

No response

jsamuel1 commented 6 months ago

Further - if we go down the helm-crd's path, then https://karpenter.sh/docs/troubleshooting/#helm-error-when-installing-the-karpenter-crd-chart provides upgrade troubleshooting.

Feder1co5oave commented 3 months ago

I want to remind everyone that even when not changing the apiVersion, CRDs can get updated with new fields. For example, support for mounting instance store volumes was added in 0.34.0: https://github.com/aws/karpenter-provider-aws/commit/571e0fb471f04f873c290287e2ff03a6210410f6#diff-f564782b6203376bd2904fb153e560aceea1ef793c2b2e5621eb9a0886e48afe