Open sknmi opened 1 month ago
fixed with
webhook:
enabled: false
I don't think this issue should be closed. I am seeing a similar error in my log messages and require the webhook to remain enabled to facilitate the conversion to the latest api version for my resources.
I agree with @levinedaniel. What is the reason to mark solution as closed with
webhook:
enabled: false
The webhook is broken.
Same, v1.0.2
. Please re-open.
Is disabling webhook an ok solution or some functionality will not work?
cc @sknmi message above
@Hronom reopened :)
Also seeing this issue after upgrading to v0.37.3
.
Saw this issue on 0.37.3
and 1.0.1
Seeing same in 1.0.2
Below findings are incorrect
Here is my observation. Please let me know if this is incorrect:
Karpenter does not provide a ca-client bundle as we can see from here.
When I look at the CRD in my cluster, I can see that it has been injected with a caBundle:
webhook:
clientConfig:
caBundle: Redacted...
service:
name: karpenter
namespace: karpenter
path: /conversion/karpenter.sh
port: 8443
conversionReviewVersions:
- v1beta1
- v1
group: karpenter.sh
I believe this is happening through ca-injector. So this means, that client config for this webhook has a ca-bundle specified but karpenter uses knative to inject certificate data into karpernter-cert
secret which comes from here.
So this means that CA for CRD & Webhooks do not match and hence the error. If this is correct, then may be we can look at the possible solutions
I am still not sure how CA bundle is injected in CRD and I did see at one point that the CA bundle in secret vs CRD was different.
This appears to be the same issue we saw with the our defaulting / validating webhooks previously, the original issue was closed out when those webhooks were disabled by default: https://github.com/kubernetes-sigs/karpenter/issues/718. I've been able to reproduce, and as with that issues there does not appear to be any actual impact to Karpenter's operation and the errors can be safely ignored.
From the original issue:
These TLS errors appear to be related to https://github.com/kubernetes/kubernetes/issues/109022 which states that these handshake errors may be generated by some caching mechanism that is happening in the standard library that causes TLS errors on a cert rotation.
@liafizan are you still running into this? The cert is injected by knative, and I've been unable to reproduce. If you're still encountering this, I'd recommend opening a separate issue. I don't think it's related to the TLS errors we're seeing here.
I am still not sure how CA bundle is injected in CRD and I did see at one point that the CA bundle in secret vs CRD was different.
I'm going to mark this issue as solved for now, but let us know if any of you believe this issue is impacting Karpenter's ability to operate.
Hello @jmdeal,
After upgrading to minor 0.37.5 to enable the deleting of webhooks when deployed with ArgoCD I see two things:
kubectl get crd nodeclaims.karpenter.sh -o jsonpath='{.spec.versions[*].name}'
=. v1 v1beta1 / So both versions exist in the cluster.
Therefore the TLS handshake error in my case seems to prevent the validating webhook to perform the v1 migration. I checked the logs inside the controller and that is all I got from the webhook ... the second one is that my CRDs are not in version v1 and are still in v1beta1 so IMO the TLS handshake error is causing the conversion webhook to fail
This doesn't indicate any issue with the conversion webhook. If you're on any pre-1.0 version with the conversion webhooks, the storage version is still v1beta1. The conversion webhooks only exist on those versions to enable rollback from v1.0. Also, once you upgrade to v1, both versions will still be present on the CRD, one isn't automatically removed once all stored resources are converted. Instead, you want to look at .status.storedVersions
on the CRDs. On Karpenter v1.0.5+ Karpenter will remove v1beta1
from the stored versions once all CRs have been successfully migrated.
@jmdeal thank you for your answer, I misunderstood the conversion webhook and thought is was the other way around, thanks for the clarification !
We are seeing this same behavior. Upgrade from 0.37.0 to 1.0.3 (with a minor upgrade to 0.37.3 during the upgrade process). The error seems to be innocuous, but I wanted to see if there was any impact to the core functionality of Karpenter.
I have done the upgrade from 0.37.5 to 1.0.6 and still see this issue. I have enabled webhook in 0.37.5 and this error is from karpenter 1.0.6
{"level":"ERROR","time":"2024-10-09T14:27:06.147Z","logger":"webhook","message":"http: TLS handshake error from 10.214.2.206:34084: EOF\n","commit":"6174c75"} {"level":"ERROR","time":"2024-10-09T14:27:06.319Z","logger":"webhook","message":"http: TLS handshake error from 10.214.60.56:40108: EOF\n","commit":"6174c75"}
+1
Description
Observed Behavior:
Expected Behavior: No errors :) Reproduction Steps (Please include YAML): Karpenter on fargate in karpenter namespace. These messages started to appear after upgrading to 1.0.1 Versions:
Chart Version: 1.0.1
Kubernetes Version (
kubectl version
): 1.30Please vote on this issue by adding a đź‘Ť reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment