AzureAD / azure-activedirectory-identitymodel-extensions-for-dotnet

IdentityModel extensions for .Net
MIT License
1.05k stars 396 forks source link

Thread starvation due to sync over async code in IdentityModel.Tokens.Validators #2556

Open mirjana1211 opened 5 months ago

mirjana1211 commented 5 months ago

Microsoft.Identity.Web Library

Microsoft.Identity.Web

Microsoft.Identity.Web version

2.17.2

Web app

Not Applicable

Web API

Protected web APIs (validating tokens)

Token cache serialization

Not Applicable

Description

We have been experiencing sporadic issues with our application deployed as an Azure App Service running on Linux (.net8). We are using Microsoft.Identity.Web library and Microsoft Entra ID as our identity provider.

From time to time, we get an alert for failed health checks for one of the instances (minimum of 6 instances running). The instance seems to be unresponsive and for a period of time and the issue always resolves itself within a few minutes. There is no relevant info in the logs or in the Diagnose and solve problems section for the app service. We can not identify any changes in request rate or resource utilization before the outage.

We have created a support case with Microsoft and have provided multiple memory dumps for analysis. The information we got from the support engineer is that the app service failure is caused by thread starvation due to sync over async code in IdentityModel.

Reproduction steps

The outage occurs sporadically and it is not possible to reproduce it on demand.

The problem seems to be related to load. We have several instances of the same application deployed in different Azure regions and the only instance where we have observed the problem is running under a relatively high load (~500-600 req/sec during peak time).

Error message

No response

Id Web logs

No response

Relevant code snippets

The code in question, according to the dumps analysis, is in the ValidateIssuer method: 

https://github.com/AzureAD/azure-activedirectory-identitymodel-extensions-for-dotnet/blob/dev/src/Microsoft.IdentityModel.Tokens/Validators.cs.

Regression

No response

Expected behavior

No application outages due to blocking calls :)

pmaytak commented 4 months ago

Based on the discussions, looks like there are 3 options:

  1. Make the currently internal ValidateIssuerAsync API in IdentityModel public and use it in Identity Web instead of the sync overload. Pro: Quick fix. Con: Async delegates API design is not fully complete. This change may make changes to this API more difficult later. Adding only one async delegate for issuer right now makes API less consistent.
  2. Make IdentityModel internals visible to Identity Web and use the internal ValidateIssuerAsync in Identity Web instead of the sync overload. In IdentityModel 8, the async delegates will be finalized and made public; IdentityModel internals can be hidden for Identity Web again. Pro: Quick fix and doesn't expose a new public API. Con: InternalsVisibleTo makes library usage and versioning more complex. May cause issues if the API needs to change later.
  3. First finish the complete feature #2568 (make all validation delegates async, obsolete sync ones, don't use exceptions for control flow). Update Identity Web to use async validators. Pro: Allows for the best thought-out solution, consistent public API. Con: Longest time frame. The change will be in IdentityModel 8 and Identity Web 3.

Based on all the discussions, option 3 seems to be the better one. It's better to deliver a more well-planned out solution.

jennyf19 commented 1 month ago

Moving to be handled part of: https://github.com/AzureAD/azure-activedirectory-identitymodel-extensions-for-dotnet/issues/2711