Closed dostalradim closed 11 months ago
Let me look into this some more. Thanks for reporting it, and for the comprehensive detail.
I did not tell you that we have 10 workers/handlers. So the probability of this error is bigger then. And the error is thrown just from one or at most two of them in one time. Thank you for reaction. Let me know when you will need more information.
A quick update on this:
I've just completed support for multi-tenancy for the 8.3.0. I will address this issue at the start of next week, and ship a fix for it in the 8.3.0 release (which will drop as soon as I patch this - so Tuesday or Wednesday)
The good news:
I have reproduced the problem on my machine using Self-Managed, by setting token expiry to 10 seconds, then starting the example/worker.js
with longPoll: false
and creating ten copies of the worker.
After a minute or two, all ten workers threw this error at the same time.
Looks like the issue is this:
The issue seems to be an edge condition where the in-memory token is evicted from the cache and the new token is in-flight. The disk cache is then hit, and the token expiry is just about to happen. The disk cached token is used, and rejected, then the new token arrives and populates the in-memory and disk cache.
So, I've update the in-memory cache eviction to remove the disk-cached token, and set the expiry timer at 1 second before the token expiry.
I have published 8.3.0-alpha9 with this fix rolled in.
@dostalradim Could you please test it out and let me know if it fixes the issue for you.
I installed application with old version, change token lifespan to 10s and wait to error appear. That happened after some time. Then I upgrade the package version to 8.3.0-alpha9 and wait for ~24 hours and error did not appear.
So, I think that the error is solved. Thank you for your fast solution.
There is probably a race condition in your token deleting mechanism. Our ZBWorker randomly throw exception with Grpc Stream Error: 16 UNAUTHENTICATED: Failed to parse bearer token, see cause for details. And it is always at the end of token expiration time. And it is not happening always it is really non-determenistic. Our tokens are valid for 5 minutes and application throw this issue only ~10 times per day.
What I checked
Expected Behavior
Obtain new token before expiration a little earlier.
Current Behavior
Expiry timer remove access token during usage in another parts of code.
Possible Solution
Change validityPeriod value to some lower value then token.expiry - current. In other words, remove token before expiration. Or do some better validity checker?
Steps to Reproduce
Install last camunda-platform helm and create application with ZBWorker like this.
Client settings:
Worker settings:
Errors from logs