Closed nhomble closed 1 year ago
This seems solved in the Java worker by adding a backoff on a failed poll
Without getting too deep into the 429 weeds at a first pass, adding backoff logic in general would be a good next step before over-indexing on just the 429 use case.
It looks like the library is trying to handle this with this change/issue, but I am seeing error code 13 in my logs.
From the grpc docs, I see the library is trying to do the right thing, but I'm thinking the error code is just not mapped correctly.
@jwulf does this analysis make sense? This looks like an old feature, so I am surprised that I am seeing this. I'm currently on 8.1.2 but nothing in the changelog tells me that 8.1.5 changes this behavior.
I’ve published 8.1.7-debug
.
It does a few things.
DEBUG='oauth'
to get debugging output from the OAuthProvider.The last two items may solve the issue you are seeing.
If not, please run your application with DEBUG='oauth'
and we'll examine the output.
Issue tracker is ONLY used for reporting bugs. New features and questions should be discussed on our slack channel.
Expected Behavior
An API client should backoff appropriately to http 429's. Especially in ideal cases where clients are calling Camunda cloud like cloud identity, the api clients should throttle better.
Current Behavior
Workers keep polling and logging 429s which increase the duration of an outage.
Possible Solution
In the case of a 429, check for
Retry-After
(or default to current strategy) and sleep for that amount + jitter.Additionally, this static wait could be enhanced to backoff.
Steps to Reproduce
Context (Environment)
New workers with valid credentials are also throttled by Camunda Identity.
This causes an outage for any worker progress.
In our case, we see the 429s with Identity.
Detailed Description
Possible Implementation