DuendeSoftware / Support

Support for Duende Software products
20 stars 0 forks source link

New signing keys created and used in jwks without activation delay #1241

Closed scott-driscoll-edgenuity closed 3 months ago

scott-driscoll-edgenuity commented 3 months ago

Which version of Duende IdentityServer are you using? 6.1.0 Which version of .NET are you using? 6 Describe the bug

We have 6 pods running Identity service and it appears that 3 of them created 3 new signing keys and each started to present them in their JWKS. The previous key had not expired and I don't understand why the three were created and why they would be immediately used in each of those pods's jwks.

Because all of the servers had not retrieved the new keys from the data store, when a request was sent with a kid from one of these servers, the other servers failed to process the request.

This is the time stamps for each of the keys that are now in the jwks

2024-03-06T01:50:36.082Z - old existing key 2024-04-25T01:50:35.109Z - new key 2024-04-25T01:50:35.125Z - new key 2024-04-25T01:50:35.138Z - new key

A clear and concise description of what the bug is.

New Signing keys were generated for an unknown reason and they were immediately displayed in the jwks

These are the options for key management: options.KeyManagement.RotationInterval = TimeSpan.FromDays(60); options.KeyManagement.PropagationTime = TimeSpan.FromDays(10); options.KeyManagement.RetentionDuration = TimeSpan.FromDays(10); options.KeyManagement.KeyCacheDuration = TimeSpan.FromHours(12);

To Reproduce

Steps to reproduce the behavior.

Expected behavior

A clear and concise description of what you expected to happen.

Log output/exception with stacktrace

data

Additional context

Add any other context about the problem here.

AndersAbel commented 3 months ago

Is this a very high load environment? It looks like the three pods decided to create new keys within a few milliseconds and that this is caused by a race condition.

Even if there are multiple keys created accidentally all the pods should load them and get them into their cache within the propagation period, i.e. before they are used.

Can you please share some information about how you store your keys?

Also just to be exclude a common source of issues, could you please verify your data protection setup? https://docs.duendesoftware.com/dataprotection

scott-driscoll-edgenuity commented 3 months ago

Hi Anders, we store our keys in dynamodb. The system was not under high load at the time. We do have data protection on. "options.KeyManagement.DataProtectKeys = true;" We currently do get an error "unprotecting key with kid" when processing an old key (from 2022 that is still in our db) but all the new keys do not produce that error.

AndersAbel commented 3 months ago

Thanks for confirming that Data Protection is not the issue here. For the old key from 2022 that cannot be unprotected you might as well remove the record from the database to get rid of the error message.

I do not have experience with DynamoDb but one thing to check is if it is configured for eventual consistency. That would increase the chance that one node would not see that another node already created a key.

Another thing you could do is to implement a real distributed lock for the IConcurrencyLock interface. That would prevent multiple nodes.

Finally, there's something that still looks odd:

Because all of the servers had not retrieved the new keys from the data store, when a request was sent with a kid from one of these servers, the other servers failed to process the request.

With the times you have in your configuration this should really not be possible. Even if there are multiple keys created by mistake, the new keys should not be used until after 10 days. During that time all nodes should have synced their reading from the data store and all have access to the same key material.

scott-driscoll-edgenuity commented 3 months ago

Hi Anders,

I was able to recreate the problem on a local instance of Identity Server. I have two keys in my database

"kid":"F26B25112B26B41477C2B1A1B8C7C3AF" creationdate "2024-02-08T17:16:03.741Z" and "kid:" D18115AF337A9D16EF2B140EDA2B3978" creationDate"2024-05-02T15:17:14.907Z"

I'm running with options.KeyManagement.RotationInterval = TimeSpan.FromDays(100);
options.KeyManagement.PropagationTime = TimeSpan.FromDays(20); options.KeyManagement.RetentionDuration = TimeSpan.FromDays(10); options.KeyManagement.KeyCacheDuration = TimeSpan.FromHours(12);

My understanding is that the second key (created on "2024-05-02T15:17:14.907Z) should not be displayed in the jwks response, but it is: { "keys": [ { "kty": "RSA", "use": "sig", "kid": "F26B25112B26B41477C2B1A1B8C7C3AF", "e": "AQAB", "n": "rqd-Gx_V8Z-cf_E43Jk1Dhhx2eW3LbnIk50s6GBajmJtgyFeiIjlQ9zeV9sA9QG7ce6nmLQWfGXdqoHRwYXhct57acdnZiHHTv-RTX7tAfdQqXiITf8vXL_bJjflkCpW0-hhowTduNQ7a3VIQqqK6V4Auw1qlQOblk8li_Bwo2eE-FvC7wrdEQAbW4nIm9jI4XuO9kUP7Q_Os-tYR9ZwOTL6OF-LLWsqF-vsFAAfoqknNtRDBhbvM5aDnoA2n-P8MDqzRa2YVyNQIvXmdBlyd4YpCyOGg_2ADIsl7Gd2TUiMucUksJ_R-9JF11fDDn-EM0bH6Be61m2ZYMQ1341Q", "alg": "RS256" }, { "kty": "RSA", "use": "sig", "kid": "D18115AF337A9D16EF2B140EDA2B3978", "e": "AQAB", "n": "qdWInCkk1SHy425bS6Z64cuL7tDLI2iqSRfE851Gj09miBhcOalgLD2I9vubH5_s7YqEM7Y92SV-I8CEzfxqwTxR9tMeEX_IQmyIoDLbd4oRLV7M6p6MIFq8FfA5NblRoXUQ_pQ32xku79jGKAxgYTcI5mR5GHor0cXJXce5vJWDDUqYJ3wXPAau4nYRNufEa5e8BCt-mchKzdaz2zvoft7cyxD5kbZOASalnC5FLD0i51ffyAWoZGgIQQ_Hxn-I39vCFSl8cDp-lF4ctK7aceb8dmXFdw-z_AASIz0k1KtrUIyQ8UMBcOufF34U2h7Q0thk0nNuBlYy7fmFhAKgqQ", "alg": "RS256" } ] }

Note: these are not the same configuration options I have in production. These were set to show that a key created on 5/2/2024 is being displayed in the jwks prior to the 20 days I have set for propagation time.

scott-driscoll-edgenuity commented 3 months ago

I'm running locally with https://github.com/DuendeSoftware/IdentityServer/releases/tag/6.1.7

This is the trace from calling the jwks endpoint. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Getting all the keys. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Cache hit when loading all keys. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Looking for active signing keys. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Looking for an active signing key for alg RS256. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Checking if key with kid F26B25112B26B41477C2B1A1B8C7C3AF is active (respecting activation delay). Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Key with kid F26B25112B26B41477C2B1A1B8C7C3AF is active. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Checking if key with kid D18115AF337A9D16EF2B140EDA2B3978 is active (respecting activation delay). Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Key with kid D18115AF337A9D16EF2B140EDA2B3978 is inactive: the current time is prior to its activation delay. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Active signing key found (respecting the activation delay) with kid: F26B25112B26B41477C2B1A1B8C7C3AF. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Found active signing key for alg RS256 with kid F26B25112B26B41477C2B1A1B8C7C3AF. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Checking if key with kid F26B25112B26B41477C2B1A1B8C7C3AF is active (respecting activation delay). Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Key with kid F26B25112B26B41477C2B1A1B8C7C3AF is active. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Checking if key with kid D18115AF337A9D16EF2B140EDA2B3978 is active (respecting activation delay). Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Key with kid D18115AF337A9D16EF2B140EDA2B3978 is inactive: the current time is prior to its activation delay. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Active signing key found (respecting the activation delay) with kid: F26B25112B26B41477C2B1A1B8C7C3AF. Duende.IdentityServer.Services.KeyManagement.KeyManager: Trace: Key rotation not required for alg RS256; New key expected to be created in 73.01:19:12 Duende.IdentityServer.Hosting.IdentityServerMiddleware: Trace: Invoking result: Duende.IdentityServer.Endpoints.Results.JsonWebKeysResult

scott-driscoll-edgenuity commented 3 months ago

Even though key D18115AF337A9D16EF2B140EDA2B3978 is inactive due to the activation delay, it's being included in the JWKS response.

josephdecock commented 3 months ago

The PropagationTime creates a delay between when the key is announced and when it is used to sign tokens. During this delay, the key is intentionally included in the jwks endpoint. The reason to have this delay is that applications and APIs typically cache the output of the jwks endpoint very aggressively, on the order of hours to days. We need all the APIs and applications to have our new key in their caches before we start signing with it. The way we ensure that is by announcing the key we will use in the future in the jwks.

scott-driscoll-edgenuity commented 3 months ago

Thank you for the explanation of Propagation Time.