cert-manager / aws-privateca-issuer

Addon for cert-manager that issues certificates using AWS ACM PCA.
Apache License 2.0
192 stars 81 forks source link

[Feature Request]: Option to disable kubernetes client side rate limiting #334

Closed hexalellogram closed 1 month ago

hexalellogram commented 2 months ago

Describe why this change is needed

Hello, we're trying to use Istio + istio-csr + cert-manager + aws-privateca-issuer in one of our production clusters, where as part of a new deployment hundreds of pods will spin up and request certificates at a time. We suspect that Kubernetes client-side rate limiting is throttling the number of pods that get a new cert at a time, causing our deployments to take much longer and time out.

We've already ruled out similar bottlenecks in istio-csr (https://github.com/cert-manager/istio-csr/pull/352) and cert-manager itself in our configuration. However, we have not seen improvements in the rate of pods that get new certs at a time, leading us to believe that there is still a bottleneck somewhere - we suspect it is aws-privateca-issuer. Looking through this repository I cannot find anywhere where the QPS and burst rate settings are being overridden, so I believe that the defaults for the client-go/rest package are being used.

We're using Kubernetes 1.29 which includes API Priority and Fairness in GA, which eliminates the need for client-side rate limiting.

Describe solutions and alternatives considered (optional)

A way to disable client-side rate limiting entirely (similar to the above linked istio-csr patch), perhaps through a Helm chart option, would be very helpful.

Is there anything else you would like to add?

This is blocking us from being able to implement mTLS in one of our critical production clusters.

ARichman555 commented 2 months ago

Thank you for raising this issue with the AWS Private CA Issuer plugin. We will review your submission and respond back to you here as soon as possible.

ARichman555 commented 1 month ago

Hi, we believe the issue is not the Kubernetes client side rate limiting but rather our Reconcile loop is taking too long. Currently we're issuing the certs synchronously, so each Reconcile loop will wait for the cert to be fully issued on the first run rather than checking later. We will look into adjusting this to issue certs asynchronously.