Closed k-walsh-gmg closed 12 months ago
Are the Karpenter pods on your cluster up and running?
No actually, but not totally sure why at this point they are not. They had been running earlier but now they are in a crashloop
They had been running earlier but now they are in a crashloop
I'd suspect that's why you are getting timeouts from the webhooks, since the lack of the pod running means that the backend for the service is going to be down.
got the pod running and seeing this : {"level":"ERROR","time":"2023-11-21T23:42:17.823Z","logger":"webhook.ConfigMapWebhook","message":"Reconcile error","commit":"1072d3b","knative.dev/traceid":"92378627-1dc2-4135-9094-c492b94b8ece","knative.dev/key":"kube-system/karpenter-cert","duration":"71.170423ms","error":"failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.config.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
AFAIK, that error is expected because there can be conflicts with reconciling the certificate on startup. Is there something else on the pod that is causing the CrashLoopBackoff?
As stated previously I was able to get karpenter up in running in a good state(to include the controller container), I got the above error after they were running. After Scaling the karpenter deployment to 0 and back to 2 the above no longer appears. Currently karpenter is deployed and creates instances but they never join the cluster(and keeps attempting to boot up instances)
Sounds good. So it sounds like the webhook issue that you were seeing originally is resolved? Can I close this issue?
Sure!
On Tue, Nov 21, 2023, 8:11 PM Jonathan Innis @.***> wrote:
Sounds good. So it sounds like the webhook issue that you were seeing originally is resolved? Can I close this issue?
— Reply to this email directly, view it on GitHub https://github.com/aws/karpenter/issues/5133#issuecomment-1821935296, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXJIKCTK5WNYNOIXZEOGLDTYFVGNRAVCNFSM6AAAAAA7VAXQAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRRHEZTKMRZGY . You are receiving this because you authored the thread.Message ID: @.***>
hi @jonathan-innis i am continuously seeing these reconcile errors after upgrading from 0.27 to 0.31.2.
2024-03-26T08:08:12.562Z DEBUG Successfully created the logger.
2024-03-26T08:08:12.563Z DEBUG Logging level set to: debug
{"level":"info","ts":1711440492.5673463,"logger":"fallback","caller":"injection/injection.go:63","msg":"Starting informers..."}
2024-03-26T08:08:12.668Z DEBUG controller waiting for configmaps {"commit": "dc3af1a"}
2024-03-26T08:08:13.495Z DEBUG controller waiting for configmaps {"commit": "dc3af1a"}
2024-03-26T08:08:14.003Z DEBUG controller waiting for configmaps {"commit": "dc3af1a"}
2024-03-26T08:08:14.511Z DEBUG controller waiting for configmaps {"commit": "dc3af1a"}
2024-03-26T08:08:15.012Z DEBUG controller waiting for configmaps {"commit": "dc3af1a"}
2024-03-26T08:08:15.650Z DEBUG controller.aws discovered region {"commit": "dc3af1a", "region": "ap-south-1"}
2024-03-26T08:08:15.830Z DEBUG controller.aws discovered cluster endpoint {"commit": "dc3af1a", "cluster-endpoint": "https://1D1B4752D1D078AEFEF3A0CC32AD79FE.gr7.ap-south-1.eks.amazonaws.com"}
2024-03-26T08:08:15.834Z DEBUG controller.aws discovered kube dns {"commit": "dc3af1a", "kube-dns-ip": "10.100.0.10"}
2024-03-26T08:08:15.835Z DEBUG controller.aws discovered version {"commit": "dc3af1a", "version": "v0.27.0"}
2024/03/26 08:08:15 Registering 2 clients
2024/03/26 08:08:15 Registering 2 informer factories
2024/03/26 08:08:15 Registering 3 informers
2024/03/26 08:08:15 Registering 6 controllers
2024-03-26T08:08:15.837Z INFO controller Starting server {"commit": "dc3af1a", "path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
2024-03-26T08:08:15.837Z INFO controller Starting server {"commit": "dc3af1a", "kind": "health probe", "addr": "[::]:8081"}
I0326 08:08:16.259816 1 leaderelection.go:248] attempting to acquire leader lease internal-system/karpenter-leader-election...
2024-03-26T08:08:16.264Z INFO controller Starting informers... {"commit": "dc3af1a"}
2024-03-26T08:08:16.591Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "dc3af1a", "knative.dev/traceid": "fa205b03-8d6a-4b58-91d5-292e0795815b", "knative.dev/key": "internal-system/karpenter-cert", "duration": "77.261µs", "error": "error retrieving webhook: mutatingwebhookconfiguration.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\" not found"}
2024-03-26T08:08:16.609Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "dc3af1a", "knative.dev/traceid": "303a0141-1370-4bb0-80e3-288f88b1203c", "knative.dev/key": "internal-system/karpenter-cert", "duration": "111.351µs", "error": "error retrieving webhook: mutatingwebhookconfiguration.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\" not found"}
2024-03-26T08:08:16.682Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "dc3af1a", "knative.dev/traceid": "500a5306-ffcc-40f6-9fdd-2af36ddf3c7a", "knative.dev/key": "internal-system/karpenter-cert", "duration": "82.431µs", "error": "error retrieving webhook: mutatingwebhookconfiguration.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\" not found"}
2024-03-26T08:08:16.702Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "dc3af1a", "knative.dev/traceid": "b2c597ee-d0a2-4a19-9321-3f44c2f0ffea", "knative.dev/key": "internal-system/karpenter-cert", "duration": "62.511µs", "error": "error retrieving webhook: mutatingwebhookconfiguration.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\" not found"}
2024-03-26T08:08:16.774Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "dc3af1a", "knative.dev/traceid": "a21e716b-2881-4364-8382-37a4e1d52ad6", "knative.dev/key": "internal-system/karpenter-cert", "duration": "81.16µs", "error": "error retrieving webhook: mutatingwebhookconfiguration.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\" not found"}
2024-03-26T08:08:16.874Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "dc3af1a", "knative.dev/traceid": "cfbfd6cd-5045-43ef-aafa-f5c3655f7521", "knative.dev/key": "internal-system/karpenter-cert", "duration": "70.85µs", "error": "error retrieving webhook: mutatingwebhookconfiguration.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\" not found"}
2024-03-26T08:08:16.886Z INFO controller.aws.pricing updated spot pricing with instance types and offerings {"commit": "dc3af1a", "instance-type-count": 631, "offering-count": 1424}
2024-03-26T08:08:16.909Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "dc3af1a", "knative.dev/traceid": "9827833b-f9d8-4ef1-94e8-9030b1d87b4e", "knative.dev/key": "defaulting.webhook.karpenter.k8s.aws", "duration": "299.030455ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"}
2024-03-26T08:08:16.909Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "dc3af1a", "knative.dev/traceid": "0cb35cfc-ad01-44ef-ab1e-5e1e3f66a93c", "knative.dev/key": "internal-system/karpenter-cert", "duration": "318.031512ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2024-03-26T08:08:16.914Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "dc3af1a", "knative.dev/traceid": "48cde0f7-fb43-4c9f-9160-2d826fe3c60f", "knative.dev/key": "validation.webhook.karpenter.k8s.aws", "duration": "284.505677ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"}
2024-03-26T08:08:16.915Z ERROR webhook.ConfigMapWebhook Reconcile error {"commit": "dc3af1a", "knative.dev/traceid": "f4eb39e9-4857-462b-bd47-69023e5dbb09", "knative.dev/key": "validation.webhook.config.karpenter.sh", "duration": "305.705028ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.config.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
Description
Observed Behavior: When attempting to create a node pool it fails giving me a webhooks issue: Error from server (InternalError): error when creating "../Buckeye(us-east-2)provisioner/default-nodepool.yaml": Internal error occurred: failed calling webhook "validation.webhook.karpenter.sh": failed to call webhook: Post "https://karpenter.karpenter.svc:8443/?timeout=10s": no endpoints available for service "karpenter" Error from server (InternalError): error when creating "../Buckeye(us-east-2)provisioner/default-nodepool.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.karpenter.k8s.aws": failed to call webhook: Post "https://karpenter.karpenter.svc:8443/?timeout=10s": no endpoints available for service "karpenter"
Expected Behavior: webhooks to work Reproduction Steps (Please include YAML): After following instructions here, it is impossible to deploy any node pools: https://karpenter.sh/preview/getting-started/migrating-from-cas/
Source: karpenter/templates/poddisruptionbudget.yaml
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: karpenter namespace: karpenter labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm spec: maxUnavailable: 1 selector: matchLabels: app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter
Source: karpenter/templates/serviceaccount.yaml
apiVersion: v1 kind: ServiceAccount metadata: name: karpenter namespace: karpenter labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm annotations: eks.amazonaws.com/role-arn: arn:aws:iam::{HIDDENACCOUNT}:role/KarpenterControllerRole-buckeye
Source: karpenter/templates/secret-webhook-cert.yaml
apiVersion: v1 kind: Secret metadata: name: karpenter-cert namespace: karpenter labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm
data: {} # Injected by karpenter-webhook
Source: karpenter/templates/configmap-logging.yaml
apiVersion: v1 kind: ConfigMap metadata: name: config-logging namespace: karpenter labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm data:
https://github.com/uber-go/zap/blob/aa3e73ec0896f8b066ddf668597a02f89628ee50/config.go
zap-logger-config: | { "level": "debug", "development": false, "disableStacktrace": true, "disableCaller": true, "sampling": { "initial": 100, "thereafter": 100 }, "outputPaths": ["stdout"], "errorOutputPaths": ["stderr"], "encoding": "json", "encoderConfig": { "timeKey": "time", "levelKey": "level", "nameKey": "logger", "callerKey": "caller", "messageKey": "message", "stacktraceKey": "stacktrace", "levelEncoder": "capital", "timeEncoder": "iso8601" } } loglevel.controller: debug loglevel.webhook: error
Source: karpenter/templates/configmap.yaml
apiVersion: v1 kind: ConfigMap metadata: name: karpenter-global-settings namespace: karpenter labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm data: batchMaxDuration: "10s" batchIdleDuration: "1s"
Source: karpenter/templates/aggregate-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: karpenter-admin labels: rbac.authorization.k8s.io/aggregate-to-admin: "true" helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm rules:
apiGroups: ["karpenter.k8s.aws"] resources: ["ec2nodeclasses"] verbs: ["get", "list", "watch", "create", "delete", "patch"]
Source: karpenter/templates/clusterrole-core.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: karpenter-core labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm rules:
Read
Write
apiGroups: ["admissionregistration.k8s.io"] resources: ["validatingwebhookconfigurations"] verbs: ["update"] resourceNames: ["validation.webhook.karpenter.sh", "validation.webhook.config.karpenter.sh"]
Source: karpenter/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: karpenter labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm rules:
Read
Write
apiGroups: ["admissionregistration.k8s.io"] resources: ["mutatingwebhookconfigurations"] verbs: ["update"] resourceNames: ["defaulting.webhook.karpenter.k8s.aws"]
Source: karpenter/templates/clusterrole-core.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: karpenter-core labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: karpenter-core subjects:
kind: ServiceAccount name: karpenter namespace: karpenter
Source: karpenter/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: karpenter labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: karpenter subjects:
kind: ServiceAccount name: karpenter namespace: karpenter
Source: karpenter/templates/role.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: karpenter namespace: karpenter labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm rules:
Read
Write
Cannot specify resourceNames on create
https://kubernetes.io/docs/reference/access-authn-authz/rbac/#referring-to-resources
apiGroups: [""] resources: ["configmaps"] verbs: ["create"]
Source: karpenter/templates/role.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: karpenter-dns namespace: kube-system labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm rules:
Read
apiGroups: [""] resources: ["services"] resourceNames: ["kube-dns"] verbs: ["get"]
Source: karpenter/templates/role.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: karpenter-lease namespace: kube-node-lease labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm rules:
Read
Write
apiGroups: ["coordination.k8s.io"] resources: ["leases"] verbs: ["delete"]
Source: karpenter/templates/rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: karpenter namespace: karpenter labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: karpenter subjects:
kind: ServiceAccount name: karpenter namespace: karpenter
Source: karpenter/templates/rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: karpenter-dns namespace: kube-system labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: karpenter-dns subjects:
kind: ServiceAccount name: karpenter namespace: karpenter
Source: karpenter/templates/rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: karpenter-lease namespace: kube-node-lease labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: karpenter-lease subjects:
kind: ServiceAccount name: karpenter namespace: karpenter
Source: karpenter/templates/service.yaml
apiVersion: v1 kind: Service metadata: name: karpenter namespace: karpenter labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm spec: type: ClusterIP ports:
name: https-webhook port: 8443 targetPort: https-webhook protocol: TCP selector: app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter
Source: karpenter/templates/deployment.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: karpenter namespace: karpenter labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm spec: replicas: 2 revisionHistoryLimit: 10 strategy: rollingUpdate: maxUnavailable: 1 selector: matchLabels: app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter template: metadata: labels: app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter annotations: checksum/settings: a2974026095d23629cb420e1654f09593727575a9cffe3eea9d747c1a62bd2cf spec: serviceAccountName: karpenter priorityClassName: "system-cluster-critical" dnsPolicy: Default containers:
The template below patches the .Values.affinity to add a default label selector where not specificed
affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms:
The template below patches the .Values.topologySpreadConstraints to add a default label selector where not specificed
topologySpreadConstraints:
name: config-logging configMap: name: config-logging
Source: karpenter/templates/webhooks.yaml
apiVersion: admissionregistration.k8s.io/v1 kind: MutatingWebhookConfiguration metadata: name: defaulting.webhook.karpenter.k8s.aws labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm webhooks:
provisioners/status scope: '*'
Source: karpenter/templates/webhooks-core.yaml
apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: name: validation.webhook.karpenter.sh labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm webhooks:
nodepools/status scope: '*'
Source: karpenter/templates/webhooks-core.yaml
apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: name: validation.webhook.config.karpenter.sh labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm webhooks:
name: validation.webhook.config.karpenter.sh admissionReviewVersions: ["v1"] clientConfig: service: name: karpenter namespace: karpenter port: 8443 failurePolicy: Fail sideEffects: None objectSelector: matchLabels: app.kubernetes.io/part-of: karpenter
Source: karpenter/templates/webhooks.yaml
apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: name: validation.webhook.karpenter.k8s.aws labels: helm.sh/chart: karpenter-v0.32.1 app.kubernetes.io/name: karpenter app.kubernetes.io/instance: karpenter app.kubernetes.io/version: "0.32.1" app.kubernetes.io/managed-by: Helm webhooks:
Versions:
Chart Version: v0.32.1
Kubernetes Version (
kubectl version
): 1.27(EKS)Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment