MicahParks / jwkset

A JWK and JWK Set implementation. An auto-caching JWK Set HTTP client is provided. Generate, validate, and inspect JWKs. Self-host this project's website: https://jwkset.com
https://jwkset.com
Apache License 2.0
35 stars 12 forks source link

Using expired token from long ago blocks at parsing the token (hits JWKS rate limit since KID is unknown) #26

Closed ovidiubuligan closed 7 months ago

ovidiubuligan commented 7 months ago

Having an expired old JWT that even has the KID removed from the JWKS (rotated cert) . The first request goes through but the rest are rate limited . Stack trace with 999 goroutines waiting on rate limiting:

997 @ 0x44b57c 0x45d927 0x967318 0x966bdc 0x966afc 0x970005 0x99fa74 0x992f23 0x992930 0x994ac5 0xe8f74b 0xe8ec94 0x8eda53 0xc721d3 0x8eda53 0xc70348 0xc6e3d1 0x8eda53 0x8f0e1e 0x8f2f77 0x8ec735 0x4839e1
#   0x967317    golang.org/x/time/rate.(*Limiter).wait+0x6d7                            /home/ovilinux/go/pkg/mod/golang.org/x/time@v0.5.0/rate/rate.go:285
#   0x966bdb    golang.org/x/time/rate.(*Limiter).WaitN+0x9b                            /home/ovilinux/go/pkg/mod/golang.org/x/time@v0.5.0/rate/rate.go:248
#   0x966afb    golang.org/x/time/rate.(*Limiter).Wait+0x3b                         /home/ovilinux/go/pkg/mod/golang.org/x/time@v0.5.0/rate/rate.go:233
#   0x970004    github.com/MicahParks/jwkset.httpClient.KeyRead+0xbe4                       /home/ovilinux/go/pkg/mod/github.com/!micah!parks/jwkset@v0.5.16/http.go:172
#   0x99fa73    github.com/MicahParks/keyfunc/v3.keyfunc.Keyfunc+0x3d3                      /home/ovilinux/go/pkg/mod/github.com/!micah!parks/keyfunc/v3@v3.2.9/keyfunc.go:135
#   0x992f22    github.com/golang-jwt/jwt/v5.(*Parser).ParseWithClaims+0x562                    /home/ovilinux/go/pkg/mod/github.com/golang-jwt/jwt/v5@v5.2.1/parser.go:90
#   0x99292f    github.com/golang-jwt/jwt/v5.(*Parser).Parse+0x8f                       /home/ovilinux/go/pkg/mod/github.com/golang-jwt/jwt/v5@v5.2.1/parser.go:45
#   0x994ac4    github.com/golang-jwt/jwt/v5.Parse+0xa4                             /home/ovilinux/go/pkg/mod/github.com/golang-jwt/jwt/v5@v5.2.1/parser.go:226

All these concurrent requests wait for at least several minutes . (even 30 mins)

If this parse is called at at every request so we should return from our main logic in a few seconds .If it hangs at jwt parsing due to rate limiting , what can be done ? (we also don't want to spam our IDP jwks but we can allow more requests towards JWKS than 1 request every few minutes)

MicahParks commented 7 months ago

@ovidiubuligan thank you for making this issue.

It is unclear to me if you are reporting 999 goroutines as a bug in the jwkset project or a circumstance encountered in a project you are working on. If it is supposed to be a bug report for the jwkset project, please do include some example code that could reproduce it. Using the net/http/httptest may help.

My current reading of this issue has identified two potential issues:

  1. The service is receiving the same request too many times.
  2. The context.Context being used isn't timeout out fast enough for your use case.

For 1, this may be a potential issue because you mentioned 999 affected goroutines trying to parse JWTs. Again, I don't have a lot of context, so this might be the desired effect. My assumption is that the HTTP client is sending too many requests or these 999 goroutines have been accumulated over hours, days, or something, while the client has been retrying requests. I don't have visibility into your project, except for what's in this issue, so it's unlikely that I can help with this.

For 2, I can help and see a use case to make changes to the github.com/MicahParks/keyfunc project and this github.com/MicahParks/jwkset project. This is due to the RefreshUnknownKID being on. It's on by default. You could turn it off, but I could do work to make it more friendly.

For keyfunc, the relevant line is here: https://github.com/MicahParks/keyfunc/blob/accaf1a836745a85ec72247022d6c14bc3b30093/keyfunc.go#L135

The ctx on this line should optionally be a user provided context.Context. This would allow you to provide your own timeout for JWT parsing. It should have been including in the /v3 release.

For jwkset the relevant line is here: https://github.com/MicahParks/jwkset/blob/302ec62fe5afea1e93bb5b23738bfeabfe4cc8b9/http.go#L172

The ctx on this line would benefit from a default timeout value being applied, so requests do not hang if a context without a timeout is passed in.

Like I mentioned, I see you are using keyfunc. I believe the best thing for you to do would be to wait for this new PR to merge, then look at the release notes and use the latest version. I'll update this issue when it's released :slightly_smiling_face: https://github.com/MicahParks/keyfunc/pull/118

MicahParks commented 7 months ago

@ovidiubuligan please use github.com/MicahParks/keyfunc/v3 with the version v3.3.2 with the new .KeyfuncCtx method. Here are the release notes.

If parsing many JWTs signed with kid not in the JWK Set header will be common for your project, I would recommend not using the default jwkset HTTP client and instead ensure the RefreshUnknownKID option is set to false.

Please let me know if the updates have helped solve your issue.

ovidiubuligan commented 7 months ago

@MicahParks For the usage context , is a relatively simple http server (single main.go file ) used by nginx to validate tokens based on some claim rules. The link to jwkset/keyfunc usage does not yet have it's libraries updated ( it is a private fork of it with latest libs): https://github.com/ovidiubuligan/nginx-subrequest-auth-jwt/blob/master/main.go#L217 I am not a native go developer but I have debugged with your pushed latest changes and would only like to cache the unkownKID result for 1 second so that Parse requests don't go through the global rate limiter and block (https://github.com/MicahParks/jwkset/blob/master/http.go#L182).

This way the client request is not blocked and can error out faster since that KID does not exist in JWKS.

Can you give me some hints on how to add this behavior ? (go doesn't have inheritance ... ) I was thinking on overriding somehow the implementation of ReadKey from http.go

func (c httpClient) KeyRead(ctx context.Context, keyID string) (jwk JWK, err error) {

Augmenting this part of the logic code with a global cache "expiry set " by KID :

    if c.refreshUnknownKID != nil {
        var cancel context.CancelFunc = func() {}
        if c.rateLimitWaitMax > 0 {
            ctx, cancel = context.WithTimeout(ctx, c.rateLimitWaitMax)
        }
        defer cancel()
        err = c.refreshUnknownKID.Wait(ctx)
        if err != nil {
            return JWK{}, fmt.Errorf("failed to wait for JWK Set refresh rate limiter due to error: %w", err)
        }
        for _, store := range c.httpURLs {
            s, ok := store.(httpStorage)
            if !ok {
                continue
            }
            err = s.refresh(ctx)
MicahParks commented 7 months ago

Thank you for linking the code. It looks like the linked project isn't using this jwkset project. If it was, it would be in the go.sum file.

I noticed that the project is actually using a pre-release version v0.3.3 of keyfunc. The jwkset project didn't become a keyfunc dependency until github.com/MicahParks/keyfunc/v3.

I assume the stack trace from the original post isn't from the linked project. I'm curious where the original stack trace came from.

I recommend using either keyfunc/v1, keyfunc/v2, or keyfunc/v3. All version should still work fine today, but any pre-release should upgrade to a major version.

Instead of trying to adding extra caching to the RefreshUnknownKID feature, I would recommend turning the feature off. Perhaps you can shorten the RefreshInterval if you would prefer the cache to stay more fresh. To do this, I would recommend you switch to the latest version of keyfunc/v3. Use keyfunc.New, not keyfunc.NewDefault or keyfunc.NewDefaultCtx, as it seems the default behavior isn't what's best for your project. It appears your project has an unusual amount of requests using JWTs signed by old keys no longer in the JWK Set. While the new updates should help your project reduce the number of requests waiting for an JWK Set refresh, it is likely simpler to turn the feature off and shorten the RefreshInterval.

As an example for keyfunc.New, you should follow the advanced example in the keyfunc project so you can turn off the RefreshUnknownKID option.

I don't have a lot of context for your project. It's likely I've misinterpreted something in the two messages already present, but I hope this has helped.

I think I've addressed the issue in the jwkset project about properly timing out JWK Set reads in the RefreshUnknownKID case, so I am going to close this issue now. If there is some other unexpected behavior in the jwkset project, or you feel your hasn't been addressed, please do reopen this issue or create a new one.

If you want to hire me as a freelance software engineer to assist in your project, you can reach out to me at: https://micahparks.com/contact