camunda-community-hub / spring-zeebe

DEPRECATED. Easily use the Zeebe Java Client in your Spring or Spring Boot projects
Apache License 2.0
205 stars 120 forks source link

Implement backoff strategy for getting tokens from Operate #565

Open markfarkas-camunda opened 10 months ago

markfarkas-camunda commented 10 months ago

Is your feature request related to a problem? Please describe. In SaaS environment we use rate-limiter mechanism, which can cause serious problems for us. Connectors try to get token (to be able to poll from Operate), but this can lead to 429 Too Many Requests because of the rate-limiter. Once this happens we can get into an infinite loop where all the connector runtime tries to fetch the token and we keep getting 429 responses. The reason why it can occur is that rate-limiting happens globally per regions and nor per cluster. See: https://github.com/camunda-cloud/team-sre/issues/545 We have observed this on DEV but this issue can occur on any environment.

Describe the solution you'd like Add backoff strategy strategy for failed requests: increase the interval of getting tokens after each failed request, to prevent bombarding the /oauth/token endpoint.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Without this solution we can easily get into an infinite loop trying to get new tokens and always hitting the rate limit in SaaS.

spalberg commented 9 months ago

We also observed this multiple times even without using connectors. We then had to scale down all our job worker deployments in all our clusters to mitigate it, which resulted in prod downtimes.