Open alanlyne opened 8 months ago
@alanlyne : Are you able to reproduce this issue on some other Kubernetes Cluster? Could this be related to some cluster misconfiguration? I tried this on kind with Kubernetes 1.29.0 but couldn't reproduce it.
@alanlyne : Are you able to reproduce this issue on some other Kubernetes Cluster? Could this be related to some cluster misconfiguration? I tried this on kind with Kubernetes 1.29.0 but couldn't reproduce it.
Yes, I've tried on 3 clusters in total. Two where on 1.28 and both had the same issue. One of the two was a clean cluster set up in AWS and the other had some extra bloat on it, argocd, linerd etc. The 3rd was a 1.27 cluster and here it worked fine.
Looks like we found the issue. Seems the client was cancelling the request. Increasing everything with the word timeout resolved the issue. More work on our side to find the correct values etc but that's the cause anyway it seems.
Config config = new ConfigBuilder(Config.empty()).withConnectionTimeout(60 * 1000).withRequestTimeout(60*1000).withUploadRequestTimeout(60 * 1000).withMasterUrl(eksEndpoint).withOauthTokenProvider(authTokenProvider).withTrustCerts().build();
Seems the client was cancelling the request.
Why was the client cancelling the request? Are the default timeouts too low for your setup?
Seems the client was cancelling the request.
Why was the client cancelling the request? Are the default timeouts too low for your setup?
Seems like that was the issue, increasing the request timeout, alone, to 40*1000 resolved that issue for us and we have not ran into the same issue since this change.
This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!
Has anyone figured out any root cause that might have changed speed in some scenarios when moving from 1.27
to 1.28+
? 10s
is quite a long time, let alone 30s
or 40s
.
This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!
I believe this is still an issue/mystery.
@chadlwilson : Is it possible to provide more details on how to reproduce this issue?
Sadly I do not personally have an environment that has experienced this.
My gut feel is that this is actually an environment-specific EKS 1.28 or Kubernetes 1.28 issue and unrelated to this client - it's just that the default 10s
timeout is what is triggered. So perhaps it's valid to close this as "cannot reproduce" and see if anyone can narrow it down.
I've tried to find changes in Kubernetes 1.28
or EKS 1.28
that might explain extremely slow pod creation but haven't found a smoking gun. My guess is something within Pod Admission Control, ValidatingAdmissionPolicy
or slow/problematic webhooks on the server side that is timing out some check but eventually allowing pods to create (or something like that).
I have tried this on two different clusters with Kubernetes v1.28.0 and the abovementioned code is working as expected.
I think this issue is not specific to any Kubernetes version but more to the cluster configuration. In @alanlyne 's case increasing the connection timeout resolved the issue.
Describe the bug
When upgrading our K8 clusters to 1.28, we are now longer able to run or create pods. This works fine with 1.27. We were on an older release of Fabric8 but have since upgraded but have the exact same issues. The error occurs with any attempt to create or run a pod but for the purpose of this, this is the method we are trying.
After 30second, or so, of running this we receive the error shown below. This error is affectively the same irrespective of the way we create a pod.
I have looked through the issue history but was unable to find anyone else with a similar issue.
Using kubectl apply -f pod.yaml works as expected.
Fabric8 Kubernetes Client version
6.10.0
Steps to reproduce
Attempt to create/run a pod with the latest Fabric8 version. Will fail after a few seconds. Occurs on all of our 1.28 clusters.
Expected behavior
Create/run a pod without error
Runtime
Kubernetes (vanilla)
Kubernetes API Server version
other (please specify in additional context)
Environment
Windows
Fabric8 Kubernetes Client Logs
Additional context
1.28.4-eks-8cb36c9