Closed mateusz-lubanski-sinch closed 6 months ago
Could you confirm if you have already seen this doc:
https://fluxcd.io/flux/cheatsheets/bootstrap/#increase-the-number-of-workers
Ref: how to increase the settings for performance tuning in Flux, you can follow this doc as a more general guide to customizing Flux:
https://fluxcd.io/flux/installation/#customize-flux-manifests
(but the first link includes a reference about your specific inquiry, how to set --kube-api-burst)
It sounds like from your report, you tried setting this value and it did not have the desired effect. Could you clarify this detail please? The suggested value in our docs related to performance tuning is 1000, I see the crossplane docs suggest 300, (maybe try a higher value?)
It is also possible that some changes in the latest version of Flux have impacted the behavior in an unexpected way. Is this a new behavior that you just noticed in the latest version v0.30.0 of Kustomize Controller? There is a new thing (ref: https://github.com/fluxcd/kustomize-controller/pull/745) in the behavior of this latest release, trying to ascertain whether it's related or not.
Thanks @kingdonb for quick answer
Yes that's correct, I tried use --kube-api-burst
settings and did not have the desired effect.
Based on my calculations from https://github.com/crossplane/crossplane/blob/master/design/one-pager-crd-scaling.md#client-side-throttling 250
should be sufficient sufficient.
To be sure I just updated both --kube-api-burst
and --kube-api-qps
settings to recommended values from https://fluxcd.io/flux/cheatsheets/bootstrap/#increase-the-number-of-workers but I still can see lot of throttling errors in kustomize-controller
:
I1116 08:28:56.890662 7 request.go:682] Waited for 3.047259872s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/vpcresources.k8s.aws/v1beta1?timeout=32s
I1116 08:29:06.915595 7 request.go:682] Waited for 2.196979557s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/external-secrets.io/v1alpha1?timeout=32s
I1116 08:29:16.934432 7 request.go:682] Waited for 4.946498296s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/secrets.crossplane.io/v1alpha1?timeout=32s
I1116 08:29:27.411427 7 request.go:682] Waited for 1.0474381s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/rds.aws.crossplane.sinch.com/v1alpha1?timeout=32s
I1116 08:29:37.415356 7 request.go:682] Waited for 3.597358097s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/crd.projectcalico.org/v1?timeout=32s
I1116 08:29:47.444735 7 request.go:682] Waited for 6.245726056s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/database.aws.crossplane.io/v1beta1?timeout=32s
I also tried downgrading kustomize-controller
(to v0.28.0
) to make sure that latest features had no impact on throttling issue but I can see in logs same throttling errors
Kubernetes version: v1.21.14-eks-fb459a0
@kingdonb do you have maybe any other advice?
Bumping the rate limits has no effect on newer Kubernetes versions due to https://github.com/fluxcd/pkg/pull/270
I guess this will be solved by using the new AggregatedDiscoveryEndpoint https://github.com/kubernetes/enhancements/issues/3352. We’ll need to revisit this in 6 months time after that flag becomes GA
is there anything which can be done on older kubernetes versions? As of today we are running on EKS 1.21 and soon we will upgrade to 1.22
This is now solved upstream with Aggregated Discovery being made GA in Kubernetes 1.30. On Kubernetes 1.30 and newer, Flux will no longer spam calls to discover all available APIs, instead it will do a single call.
Error message:
Throttling logs for
kustomize-controller
:From all deployed FLUX controllers throttling logs occurs only on
kustomize-controller
Additional context
kustomize-controller
version:--kube-api-burst
argument passed to containerWe faced above issue after deploying crossplane with provider-aws to our cluster which added bunch of new CRD's Today we have 173 API GroupVersions on our cluster.:
Client side throttling issue is explained in detail here: https://github.com/crossplane/crossplane/blob/master/design/one-pager-crd-scaling.md#client-side-throttling
Expected behavior
After setting
--kube-api-burst
container argument throttling logs should disappear orWaited for
time should be close to 1s (e.g.Waited for 1.045801429s due to client-side throttling
)