envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.66k stars 4.75k forks source link

jwt_authn: Improve JWKs caching #14557

Open skjolber opened 3 years ago

skjolber commented 3 years ago

Title: Improve JWKs caching

Description:

So caching remote JWKs comes with a few advantages and disadvantages. We'd like to cache the keys for performance reasons, but for security reasons, also react quickly to updated JWKs. In addition we'd like some robustness; refreshing cache without affecting incoming traffic, handling of transient network errors and authorization server downtime, protecting authorization servers from being overwhelmed when getting JWKs.

This can be implemented by

  • refreshing cache on JWTs signed with unknown keys (rate-limited, queuing similar requests)
  • retrying once on (transient) network errors
  • refreshing cache in background thread (before the cache expires)
  • adding a new "fallback-cache", with longer time to live, for use when refreshing cache fails (due to downtime).
  • adding cache health indicator

Note: I've not studied the source code in detail, you might have one or two of these done already.

Relevant Links:

Example implementation.

qiwzhang commented 3 years ago

Did not quite understand refreshing cache on JWTs signed with unknown keys (rate-limited, queuing similar requests) requirement. Please clarfy.

Some of requirements can be achieved by the two action items in https://github.com/envoyproxy/envoy/issues/14556

skjolber commented 3 years ago

Sorry, it was a bit short to say the least. Here is a better explanation:

If the authorization server rotates its JWKs, because they are considered outdated, compromised or for some other reason, the resource servers will start seeing JWTs signed with a new, to it unknown key, while having a seemingly valid cache.

At this point, we'd like the resource server to

So while the resource server will not be able to detect this situation before seeing a JWT with a new key (or when the JWKs cache is refreshed), a decent best effort approach would be to refresh the cache when it sees an unknown key.

However JWTs with unknown key ids can be generated by an evil party, and the resource service should take care not to "DDOS" the authorization service if so. So that can be solved by rate-limiting the JWKs refresh, and also making sure that only one refresh is in progress at any time.

skjolber commented 3 years ago

See also some of the finer details in the example implementation java source code.

qiwzhang commented 3 years ago

Did not get the picture of what your "resource server" does. Could you describe your issue in term of envoy jwt_authn filter?

skjolber commented 3 years ago

Resource server: server hosting the protected resources. This is the API you want to access. So envoy/jwt_authn.

qiwzhang commented 3 years ago

I see. is this understanding correct? Jwks cache should be purged and a new Jwks fetch should be triggered when a JWT with a new key is seen. To prevent DDOS on the JWKS server, JWKS fetch should be rate limited, e.g. only one request per server at any given time.

skjolber commented 3 years ago

Yes. Purging the cache at once is perhaps a bit much, better keep the cache untill fetch is done, then replace its content. While fetching is happening, other requests (with known key) should not be kept waiting. Rate limiting should be no more often than back-to-back requests, preferably be configurable as n requests per time interval + a bucket size. Like 10 per minute or so.

Edit: More presice language / details

martin2176 commented 7 months ago

I see. is this understanding correct? Jwks cache should be purged and a new Jwks fetch should be triggered when a JWT with a new key is seen. To prevent DDOS on the JWKS server, JWKS fetch should be rate limited, e.g. only one request per server at any given time.

Wondering if this new Key issue still unresolved? Does envoy fetch a new JWKS (pub Key) when it sees a JWT with a cache miss on KID ?