Open mirjana1211 opened 5 months ago
Based on the discussions, looks like there are 3 options:
ValidateIssuerAsync
API in IdentityModel public and use it in Identity Web instead of the sync overload.
Pro: Quick fix.
Con: Async delegates API design is not fully complete. This change may make changes to this API more difficult later. Adding only one async delegate for issuer right now makes API less consistent.ValidateIssuerAsync
in Identity Web instead of the sync overload. In IdentityModel 8, the async delegates will be finalized and made public; IdentityModel internals can be hidden for Identity Web again.
Pro: Quick fix and doesn't expose a new public API.
Con: InternalsVisibleTo makes library usage and versioning more complex. May cause issues if the API needs to change later.Based on all the discussions, option 3 seems to be the better one. It's better to deliver a more well-planned out solution.
Moving to be handled part of: https://github.com/AzureAD/azure-activedirectory-identitymodel-extensions-for-dotnet/issues/2711
Microsoft.Identity.Web Library
Microsoft.Identity.Web
Microsoft.Identity.Web version
2.17.2
Web app
Not Applicable
Web API
Protected web APIs (validating tokens)
Token cache serialization
Not Applicable
Description
We have been experiencing sporadic issues with our application deployed as an Azure App Service running on Linux (.net8). We are using Microsoft.Identity.Web library and Microsoft Entra ID as our identity provider.
From time to time, we get an alert for failed health checks for one of the instances (minimum of 6 instances running). The instance seems to be unresponsive and for a period of time and the issue always resolves itself within a few minutes. There is no relevant info in the logs or in the Diagnose and solve problems section for the app service. We can not identify any changes in request rate or resource utilization before the outage.
We have created a support case with Microsoft and have provided multiple memory dumps for analysis. The information we got from the support engineer is that the app service failure is caused by thread starvation due to sync over async code in IdentityModel.
Reproduction steps
The outage occurs sporadically and it is not possible to reproduce it on demand.
The problem seems to be related to load. We have several instances of the same application deployed in different Azure regions and the only instance where we have observed the problem is running under a relatively high load (~500-600 req/sec during peak time).
Error message
No response
Id Web logs
No response
Relevant code snippets
Regression
No response
Expected behavior
No application outages due to blocking calls :)