Closed ovidiubuligan closed 7 months ago
@ovidiubuligan thank you for making this issue.
It is unclear to me if you are reporting 999 goroutines as a bug in the jwkset
project or a circumstance encountered in a project you are working on. If it is supposed to be a bug report for the jwkset
project, please do include some example code that could reproduce it. Using the net/http/httptest
may help.
My current reading of this issue has identified two potential issues:
context.Context
being used isn't timeout out fast enough for your use case.For 1
, this may be a potential issue because you mentioned 999 affected goroutines trying to parse JWTs. Again, I don't have a lot of context, so this might be the desired effect. My assumption is that the HTTP client is sending too many requests or these 999 goroutines have been accumulated over hours, days, or something, while the client has been retrying requests. I don't have visibility into your project, except for what's in this issue, so it's unlikely that I can help with this.
For 2
, I can help and see a use case to make changes to the github.com/MicahParks/keyfunc
project and this github.com/MicahParks/jwkset
project. This is due to the RefreshUnknownKID
being on. It's on by default. You could turn it off, but I could do work to make it more friendly.
For keyfunc
, the relevant line is here:
https://github.com/MicahParks/keyfunc/blob/accaf1a836745a85ec72247022d6c14bc3b30093/keyfunc.go#L135
The ctx
on this line should optionally be a user provided context.Context
. This would allow you to provide your own timeout for JWT parsing. It should have been including in the /v3
release.
For jwkset
the relevant line is here:
https://github.com/MicahParks/jwkset/blob/302ec62fe5afea1e93bb5b23738bfeabfe4cc8b9/http.go#L172
The ctx
on this line would benefit from a default timeout value being applied, so requests do not hang if a context without a timeout is passed in.
Like I mentioned, I see you are using keyfunc
. I believe the best thing for you to do would be to wait for this new PR to merge, then look at the release notes and use the latest version. I'll update this issue when it's released :slightly_smiling_face:
https://github.com/MicahParks/keyfunc/pull/118
@ovidiubuligan please use github.com/MicahParks/keyfunc/v3
with the version v3.3.2
with the new .KeyfuncCtx
method. Here are the release notes.
If parsing many JWTs signed with kid
not in the JWK Set header will be common for your project, I would recommend not using the default jwkset
HTTP client and instead ensure the RefreshUnknownKID
option is set to false
.
Please let me know if the updates have helped solve your issue.
@MicahParks For the usage context , is a relatively simple http server (single main.go file ) used by nginx to validate tokens based on some claim rules. The link to jwkset/keyfunc usage does not yet have it's libraries updated ( it is a private fork of it with latest libs):
https://github.com/ovidiubuligan/nginx-subrequest-auth-jwt/blob/master/main.go#L217
I am not a native go developer but I have debugged with your pushed latest changes and would only like to cache the unkownKID result for 1 second so that Parse
requests don't go through the global rate limiter and block (https://github.com/MicahParks/jwkset/blob/master/http.go#L182).
This way the client request is not blocked and can error out faster since that KID does not exist in JWKS.
Can you give me some hints on how to add this behavior ? (go doesn't have inheritance ... ) I was thinking on overriding somehow the implementation of ReadKey
from http.go
func (c httpClient) KeyRead(ctx context.Context, keyID string) (jwk JWK, err error) {
Augmenting this part of the logic code with a global cache "expiry set " by KID :
if c.refreshUnknownKID != nil {
var cancel context.CancelFunc = func() {}
if c.rateLimitWaitMax > 0 {
ctx, cancel = context.WithTimeout(ctx, c.rateLimitWaitMax)
}
defer cancel()
err = c.refreshUnknownKID.Wait(ctx)
if err != nil {
return JWK{}, fmt.Errorf("failed to wait for JWK Set refresh rate limiter due to error: %w", err)
}
for _, store := range c.httpURLs {
s, ok := store.(httpStorage)
if !ok {
continue
}
err = s.refresh(ctx)
Thank you for linking the code. It looks like the linked project isn't using this jwkset
project. If it was, it would be in the go.sum
file.
I noticed that the project is actually using a pre-release version v0.3.3
of keyfunc
. The jwkset
project didn't become a keyfunc
dependency until github.com/MicahParks/keyfunc/v3
.
I assume the stack trace from the original post isn't from the linked project. I'm curious where the original stack trace came from.
I recommend using either keyfunc/v1
, keyfunc/v2
, or keyfunc/v3
. All version should still work fine today, but any pre-release should upgrade to a major version.
Instead of trying to adding extra caching to the RefreshUnknownKID
feature, I would recommend turning the feature off. Perhaps you can shorten the RefreshInterval
if you would prefer the cache to stay more fresh. To do this, I would recommend you switch to the latest version of keyfunc/v3
. Use keyfunc.New
, not keyfunc.NewDefault
or keyfunc.NewDefaultCtx
, as it seems the default behavior isn't what's best for your project. It appears your project has an unusual amount of requests using JWTs signed by old keys no longer in the JWK Set. While the new updates should help your project reduce the number of requests waiting for an JWK Set refresh, it is likely simpler to turn the feature off and shorten the RefreshInterval
.
As an example for keyfunc.New
, you should follow the advanced example in the keyfunc
project so you can turn off the RefreshUnknownKID
option.
I don't have a lot of context for your project. It's likely I've misinterpreted something in the two messages already present, but I hope this has helped.
I think I've addressed the issue in the jwkset
project about properly timing out JWK Set reads in the RefreshUnknownKID
case, so I am going to close this issue now. If there is some other unexpected behavior in the jwkset
project, or you feel your hasn't been addressed, please do reopen this issue or create a new one.
If you want to hire me as a freelance software engineer to assist in your project, you can reach out to me at: https://micahparks.com/contact
Having an expired old JWT that even has the KID removed from the JWKS (rotated cert) . The first request goes through but the rest are rate limited . Stack trace with 999 goroutines waiting on rate limiting:
All these concurrent requests wait for at least several minutes . (even 30 mins)
If this
parse
is called at at every request so we should return from our main logic in a few seconds .If it hangs at jwt parsing due to rate limiting , what can be done ? (we also don't want to spam our IDP jwks but we can allow more requests towards JWKS than 1 request every few minutes)