dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
853 stars 286 forks source link

Possible threading issues in version 3.0 when using Azure Managed Identity auth #1209

Closed ccarpediem closed 3 years ago

ccarpediem commented 3 years ago

After upgrading our app from Microsoft.Data.SqlClient 2.1.3 to version 3.0, our application simply seems to "freeze" upon opening any sql connection using Azure Managed Identity auth. This did work in 2.1.3 and the connection does work if we change it to use SQL auth. When it "freezes" there are no errors/exceptions thrown; it just hangs indefinitely. The only diagnostic information directly available seems to be from the SQL Server (Azure SQL Managed Instance). That error is Error: 33155, Severity: 20, State: 1 (A disconnect event was raised when server is waiting for Federated Authentication token. This could be due to client close or server timeout expired).

I've created a very simple sample app to further diagnose the issue and my findings thus far are: • If you set the project to be a console app all works as expected with Managed Identity • If you set the project to be a WinForm app running the exact same code, it will experience the "freeze" • If you update the WinForm code to use "con.OpenAsync().Wait();" instead of "con.Open();" again all works without the freeze

Is this a known/expected issue? While changing the connection to use OpenAsync() in a small sample app isn't a big deal, it is a much larger problem for huge legacy application with many modules so hoping this isn't expected behavior going forward.

To reproduce

WindowsFormsApp4.zip

Further technical details

Microsoft.Data.SqlClient version: 3.0 Framework 4.7.2 SQL Server version: Azure SQL Managed Instance SQL Authentication: Azure Managed Identity Operating system: Windows 2019

<?xml version="1.0" encoding="utf-8"?>

cheenamalhotra commented 3 years ago

Hi @ccarpediem

This is a strange behavior and not expected. OpenAsync() and Open() should ideally not have any change in behavior, but we'll test out your repro to confirm that.

Could you also confirm:

ccarpediem commented 3 years ago
ccarpediem commented 3 years ago

I've also added a SQLClient EventListener to my sample code (attached as Form1.txt). The results from running Open() vs OpenAsync() are also attached. I'm not sure it really help, but figure it cannot hurt.

OpenAsync_Log.txt Open_Log.txt Form1.txt

cheenamalhotra commented 3 years ago

I think this problem is coming from MSAL library we're using here and seems to impact all AAD modes for Windows Forms. You could run a quick test by updating your repro to:

using Microsoft.Identity.Client;
...
       private void button1_Click(object sender, EventArgs e)
        {
            var appId = "2fd908ad-0664-4344-b9be-cd3e8b574c38"; // use anything here.
            var authority = "https://login.windows.net/<tenantId>/";
            var redirectUri = "https://login.microsoftonline.com/common/oauth2/nativeclient";
            CancellationTokenSource ctsInteractive = new CancellationTokenSource();
            ctsInteractive.CancelAfter(180000);
            IPublicClientApplication app = PublicClientApplicationBuilder.Create(appId)
                .WithAuthority(authority)
                .WithRedirectUri(redirectUri)
                .Build();
            run().Wait();
            async Task run() => await app.AcquireTokenInteractive(new string[] { "https://database.windows.net//.default" })
                .WithLoginHint("")
                .WithCorrelationId(Guid.NewGuid())
                .ExecuteAsync(ctsInteractive.Token);
            textBox1.Text = "Token acquired!";
        }

Alternatively, also if you update to using Azure Identity's API, it continues to freeze.

using Azure.Identity;
using Azure.Core;
...
       private void button1_Click(object sender, EventArgs e)
        {
            CancellationTokenSource ctsInteractive = new CancellationTokenSource();
            InteractiveBrowserCredential credential = new InteractiveBrowserCredential();
            getToken(credential).Wait();
            async Task getToken(TokenCredential cred) => await cred.GetTokenAsync(
                new TokenRequestContext(new string[] { "https://database.windows.net//.default" }), ctsInteractive.Token);
        }

Or use ManagedIdentityCredentialhere instead of InteractiveBrowserCredential, still cannot get token. I updated all related packages for Azure.Identity and MSAL libraries to latest versions but issue continues to persist.

The same code however works when called via OpenAsync() which is a strange mystery to me I'll look more into. But I would highly recommend raising this issue with MSAL team with these repro cases as it can be reproduced directly with MSAL.

ghost commented 3 years ago

Judging by the trace that stops on "Generating federated authentication token", I believe this line in SqlInternalConnectionTds.GetFedAuthToken is very likely to cause a deadlock because of the blocking call to .Result:

_fedAuthToken = authProvider.AcquireTokenAsync(authParamsBuilder).Result.ToSqlFedAuthToken();

By the way, this seems to be the same issue: https://github.com/dotnet/SqlClient/issues/1138

ccarpediem commented 3 years ago

Thanks @AnKor1985! @cheenamalhotra yes I think it may have been a change committed as part to https://github.com/dotnet/SqlClient/pull/1010 which changed how those these are called within SqlInternalConnectionTds as shown below.

image

cheenamalhotra commented 3 years ago

Hi @ccarpediem

Yes I believe that exact change caused issues on Windows Forms. I was investigating it too and just opened PR #1213 with necessary changes, which we'll include in next release and plan to hotfix 3.0.

janecosham commented 3 years ago

Hi @cheenamalhotra

I'm currently working on a release which is switching from System.Data.SqlClient to Microsoft.Data.SqlClient using 3.0. Everything is working fine except I just found this issue with our WinForms tools which will block the release. You said this is planned to be in a hotfix - can you share expected timing on this? Is it quite likely to be in the next 2 weeks for example? If it is then we can wait, but if it is not likely to be in the next 2 weeks then that's ok too and we will downgrade to 2.1.3.

Thanks!

cheenamalhotra commented 3 years ago

I will be likely in next 2-3 weeks. Backport PRs are under preparation.

janecosham commented 3 years ago

Thanks!