Closed ManuelMueller1st closed 1 month ago
My understanding from the reproduction steps is that I should be able to reproduce this by applying the provided NodePool on 0.37.1
, upgrading Karpenter to 1.0.0
, and reapplying the same NodePool after the upgrade has completed. I've been unable to replicate this with the provided NodePool, are you able to elaborate on the order of events? Specifically, could you elaborate on what you did to upgrade to 1.0.0
and if there were any other changes to resources in the cluster as part of that upgrade process?
I noticed that the error only occurs if we use kubectl apply --server-side. We followed the https://karpenter.sh/preview/upgrading/v1-migration/ instructions to upgrade to Karpenter 1.0.0.
I noticed that the error only occurs if we use kubectl apply --server-side. We followed the karpenter.sh/preview/upgrading/v1-migration instructions to upgrade to Karpenter 1.0.0.
Using client-side apply mitigated the issue for us. It's not perfect for out GitOps solution tho.
Hi! Same setup on our side, upgrade from 0.37.1 to 1.0.0, post-upgrade webhooks passed successfully. We are trying to apply the following NodePool:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
annotations:
compatibility.karpenter.sh/v1beta1-kubelet-conversion: '{"clusterDNS":["x.x.x.x"]}'
compatibility.karpenter.sh/v1beta1-nodeclass-reference: '{"kind":"EC2NodeClass","name":"bottlerocket","apiVersion":"karpenter.k8s.aws/v1beta1"}'
labels:
kustomize.toolkit.fluxcd.io/name: karpenter-node-pool
kustomize.toolkit.fluxcd.io/namespace: karpenter
name: default-ondemand-amd64
spec:
disruption:
budgets:
- nodes: 10%
consolidateAfter: 0s
consolidationPolicy: WhenEmptyOrUnderutilized
limits:
cpu: "100"
template:
spec:
expireAfter: 720h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: bottlerocket
requirements:
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: karpenter.k8s.aws/instance-category
operator: In
values:
- c
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- c5a
- c6a
- key: karpenter.k8s.aws/instance-cpu
operator: In
values:
- "4"
- "8"
- "16"
startupTaints:
- effect: NoExecute
key: node.cilium.io/agent-not-ready
This results in the following error during apply:
NodePool/arm-ondemand dry-run failed, error: failed to prune fields: failed add back owned items: failed to convert pruned object at version karpenter.sh/v1: conversion webhook for karpenter.sh/v1beta1, Kind=NodePool failed: Post "https://karpenter.karpenter.svc:8443/conversion/karpenter.sh?timeout=30s": EOF
And the following traceback on the karpenter controller:
karpenter-6b4bd4c96c-nb2lf controller {"level":"ERROR","time":"2024-08-28T15:10:58.539Z","logger":"webhook","message":"http: panic serving 172.23.219.89:52172: runtime error: invalid memory address or nil pointer dereference\ngoroutine 34311 [running]:\nnet/http.(*conn).serve.func1()\n\tnet/http/server.go:1903 +0xb0\npanic({0x2225100?, 0x4734a10?})\n\truntime/panic.go:770 +0x124\nsigs.k8s.io/karpenter/pkg/apis/v1.(*NodeClaimTemplate).convertFrom(0x4005cdd310, {0x2f1bb28, 0x40070d5470}, 0x4001832b08)\n\tsigs.k8s.io/karpenter@v1.0.0/pkg/apis/v1/nodepool_conversion.go:181 +0x188\nsigs.k8s.io/karpenter/pkg/apis/v1.(*NodePoolSpec).convertFrom(0x4005cdd310, {0x2f1bb28, 0x40070d5470}, 0x4001832b08)\n\tsigs.k8s.io/karpenter@v1.0.0/pkg/apis/v1/nodepool_conversion.go:145 +0xe8\nsigs.k8s.io/karpenter/pkg/apis/v1.(*NodePool).ConvertFrom(0x4005cdd208, {0x2f1bb28?, 0x40070d5470?}, {0x2efd390?, 0x4001832a00})\n\tsigs.k8s.io/karpenter@v1.0.0/pkg/apis/v1/nodepool_conversion.go:121 +0x124\nknative.dev/pkg/webhook/resourcesemantics/conversion.(*reconciler).convert(0x40005c8d80, {0x2f1bb28, 0x40070d5320}, {{0x4005e9e6c0, 0x214, 0x240}, {0x0, 0x0}}, {0x40046fa8b0, 0xf})\n\tknative.dev/pkg@v0.0.0-20231010144348-ca8c009405dd/webhook/resourcesemantics/conversion/conversion.go:137 +0x119c\nknative.dev/pkg/webhook/resourcesemantics/conversion.(*reconciler).Convert(0x40005c8d80, {0x2f1bb28?, 0x40070d52c0?}, 0x40088e69c0)\n\tknative.dev/pkg@v0.0.0-20231010144348-ca8c009405dd/webhook/resourcesemantics/conversion/conversion.go:57 +0x174\nknative.dev/pkg/webhook.New.conversionHandler.func5({0x2f0c658, 0x40047736c0}, 0x4005e25200)\n\tknative.dev/pkg@v0.0.0-20231010144348-ca8c009405dd/webhook/conversion.go:66 +0x24c\nnet/http.HandlerFunc.ServeHTTP(0x4000d18080?, {0x2f0c658?, 0x40047736c0?}, 0x1d01d10?)\n\tnet/http/server.go:2171 +0x38\nnet/http.(*ServeMux).ServeHTTP(0x40070d5170?, {0x2f0c658, 0x40047736c0}, 0x4005e25200)\n\tnet/http/server.go:2688 +0x1a4\nknative.dev/pkg/webhook.(*Webhook).ServeHTTP(0x4000d18000, {0x2f0c658, 0x40047736c0}, 0x4005e25200)\n\tknative.dev/pkg@v0.0.0-20231010144348-ca8c009405dd/webhook/webhook.go:310 +0xc4\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP(0x40004743f0, {0x2f0c658, 0x40047736c0}, 0x4005e25200)\n\tknative.dev/pkg@v0.0.0-20231010144348-ca8c009405dd/network/handlers/drain.go:113 +0x158\nnet/http.serverHandler.ServeHTTP({0x2ef71d0?}, {0x2f0c658?, 0x40047736c0?}, 0x6?)\n\tnet/http/server.go:3142 +0xbc\nnet/http.(*conn).serve(0x400a3701b0, {0x2f1bb28, 0x4000a21290})\n\tnet/http/server.go:2044 +0x508\ncreated by net/http.(*Server).Serve in goroutine 364\n\tnet/http/server.go:3290 +0x3f0\n","commit":"5bdf9c3"}```
We'll take a look at the client-side apply but this would not be ideal for the same reason as @sherifabdlnaby
What should be done?
Closing this issue as a duplicate of https://github.com/aws/karpenter-provider-aws/issues/6867. Please follow there on the progress of this issue
Description
Observed Behavior: We've migrated from Karpenter 0.37.1 to 1.0.0. Now if I apply a NodePool the Karpenter pod logs the following error:
Kubectl logs the following error:
Here is the NodePool I want to apply:
Expected Behavior:
NodePool gets applied without a error.
Reproduction Steps (Please include YAML):
Apply the yaml from above with 0.37.1, and reapply it with 1.0.0.
Versions:
Chart Version: 1.0.0
Kubernetes Version (
kubectl version
): 1.0.0Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment