Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.26k stars 4.6k forks source link

[BUG] "Azure CLI authentication timed out" from AzureCliCredential when AZ CLI is configured for automatic upgrades #32993

Open NeilMacMullen opened 1 year ago

NeilMacMullen commented 1 year ago

Library name and version

Azure.Identity 1.70

Describe the bug

When using this code

new ChainedTokenCredential(new AzureCliCredential())

IF AZ CLI detects a new version is available (which happens quite regularly, e.g. upgrade from 2.42.0 to 2.43.0))

then an exception is thrown...


Azure.Identity.AuthenticationFailedException: The ChainedTokenCredential failed due to an unhandled exception: Azure CLI authentication timed out. Azure.Identity.AuthenticationFailedException: The ChainedTokenCredential failed due to an unhandled exception: Azure CLI authentication timed out.
 ---> Azure.Identity.AuthenticationFailedException: Azure CLI authentication timed out.
   at Azure.Identity.AzureCliCredential.RequestCliAccessTokenAsync(Boolean async, TokenRequestContext context, CancellationToken cancellationToken)
   at Azure.Identity.AzureCliCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
   at Azure.Identity.CredentialDiagnosticScope.FailWrapAndThrow(Exception ex, String additionalMessage)
   at Azure.Identity.AzureCliCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
   at Azure.Identity.AzureCliCredential.GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken)
   at Azure.Identity.ChainedTokenCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Azure.Identity.ChainedTokenCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
   at Azure.Identity.ChainedTokenCredential.GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken)
   at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.AccessTokenCache.GetHeaderValueFromCredentialAsync(TokenRequestContext context, Boolean async, CancellationToken cancellationToken)
   at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.AccessTokenCache.GetHeaderValueAsync(HttpMessage message, TokenRequestContext context, Boolean async)
   at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.AccessTokenCache.GetHeaderValueAsync(HttpMessage message, TokenRequestContext context, Boolean async)
   at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.AuthenticateAndAuthorizeRequestAsync(HttpMessage message, TokenRequestContext context)
   at Azure.Security.KeyVault.ChallengeBasedAuthenticationPolicy.AuthorizeRequestOnChallengeAsyncInternal(HttpMessage message, Boolean async)
   at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.RedirectPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.HttpPipeline.SendRequestAsync(Request request, CancellationToken cancellationToken)
   at Azure.Security.KeyVault.KeyVaultPipeline.SendRequestAsync(Request request, CancellationToken cancellationToken)
   at Azure.Security.KeyVault.KeyVaultPipeline.SendRequestAsync[TResult](RequestMethod method, Func`1 resultFactory, CancellationToken cancellationToken, String[] path)
   at Azure.Security.KeyVault.Secrets.SecretClient.GetSecretAsync(String name, String version, CancellationToken cancellationToken)
   at TableAccess_std.KeyVaultAccess.FetchConnectionString(String secretName)

In this particular case the credentials are being used to fetch a key vault secret but I don't think the problem is specific to that exact use-case.

Expected behavior

An exception should not be thrown; the client code should not be responsible for managing upgrades to the package!

Actual behavior

As above, an exception is thrown.

Interestingly if the code is being run form a console application in a powershell session then console input stops working after this error which leads me to speculate that the library is trying to take over user input to prompt for an upgrade.

Reproduction Steps

Install an old version of AZ CLI then execute..

var chain = new ChainedTokenCredential(new AzureCliCredential()    );
var client = new SecretClient(..keyvault URI...,chain);
await client.GetSecretAsync(..secret name...);

Environment

.NET SDK: Version: 7.0.100 Commit: e12b7af219

Runtime Environment: OS Name: Windows OS Version: 10.0.22621 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\7.0.100\

Host: Version: 7.0.0 Architecture: x64 Commit: d099f075e4

.NET SDKs installed: 5.0.404 [C:\Program Files\dotnet\sdk] 6.0.201 [C:\Program Files\dotnet\sdk] 7.0.100 [C:\Program Files\dotnet\sdk]

.NET runtimes installed: Microsoft.AspNetCore.App 3.1.31 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 5.0.13 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 6.0.3 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 6.0.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 7.0.0 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 3.1.22 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 3.1.31 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 5.0.13 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 5.0.15 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 6.0.3 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 6.0.11 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 7.0.0 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 3.1.22 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 3.1.31 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 5.0.13 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 5.0.15 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 6.0.3 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 6.0.11 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 7.0.0 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

Other architectures found: x86 [C:\Program Files (x86)\dotnet] registered at [HKLM\SOFTWARE\dotnet\Setup\InstalledVersions\x86\InstallLocation]

Environment variables: Not set

global.json file: Not found

Learn more: https://aka.ms/dotnet/info

Download .NET: https://aka.ms/dotnet/download

jsquire commented 1 year ago

//cc: @schaabs

jsquire commented 1 year ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

christothes commented 1 year ago

Hi @NeilMacMullen - I'm currently running a version of Azure Cli that is not the newest and this does not reproduce for me. Could you try reproducing with cli debug logging enabled to see what it's doing when this reproduces for you?

To configure logging:

az config set core.log_level=debug
az config set logging.enable_log_file=yes
az config set logging.enable_log_file=~/az-logs

Then reproduce the issue. This should log detailed timestamped output to the user's profile directory under the az-logs dir.

NeilMacMullen commented 1 year ago

I think that last line az config set logging.enable_log_file=~/az-logs is erronenous? Regardless, found the logs in the "logs" folder. I've attached. Interestingly this error does not occur on one of my other machines so this might be setup-dependent.

az.log

NeilMacMullen commented 1 year ago

For comparison, here's a log of the same operation succeeding when an upgrade is not pending... az.log

One thing I also noticed is that when the operation fails, it leaves behind a large number of zombie Python processes, which seems consistent with the idea that is trying to perform/request an upgrade behind the scenes.

christothes commented 1 year ago

Do you have automatic upgrades configured? https://learn.microsoft.com/en-us/cli/azure/update-azure-cli#automatic-update

NeilMacMullen commented 1 year ago

I'm pretty sure I've never explicitly configured it but it is present in the config file... image

So is the the problem? It seems a bit non-intuitive that this would have an effect when using AzureCliCredential.... I (probably naively!) assumed I was calling a .net library that happened to have access to the az credentials, not that I was invoking the powershell scripts and firing up Python ;-)

christothes commented 1 year ago

I (probably naively!) assumed I was calling a .net library that happened to have access to the az credentials, not that I was invoking the powershell scripts and firing up Python ;-)

Yes - it just invokes the process.

ghost commented 1 year ago

Hi @NeilMacMullen. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.

NeilMacMullen commented 1 year ago

@christothes In that case is there any way to catch this from the client code? - It would be nice to be able to warn the user that this is the reason for failure rather than just getting a generic "timeout" exception.

christothes commented 1 year ago

Yes, good point - we could add a check for that.

martinjosephogorman commented 1 year ago

I'm getting the above error "Azure CLI authentication timed out" when using Azure cloud agents. If I set the powershell to a already installed version I do not get the error.

Version of Azure Powershell get the error which installs: "9.7.1" Version I do not get the error: ""LatestVersion" or "7.2.11" which is the latest installed version.

fvilches17 commented 1 year ago

In my case, I noticed this error when I started doing concurrent tasks which involved calling TokenCredential.GetTokenAsync

I never saw this error when the code was synchronous.

Adding a lock mechanism fixed the issue. See the use of SemaphoreSlim below

// SomeClass.cs

    private readonly TokenCredential _azureCredential;
    private readonly TokenRequestContext _tokenRequestContext;

    private static readonly SemaphoreSlim _semaphore = new(initialCount: 1, maxCount: 1);

    public async Task<AuthenticationHeaderValue> GetAuthenticationHeaderAsync(CancellationToken cancellationToken)
    {
        try
        {
            await _semaphore.WaitAsync(cancellationToken);
            AccessToken accessToken = await _azureCredential.GetTokenAsync(_tokenRequestContext, cancellationToken); // _azureCredential is AzureCliCredential
            return new AuthenticationHeaderValue("Bearer", accessToken.Token);
        }
        finally
        {
            _semaphore.Release();
        }
    }

However, I am still not 100% sure the error was due to concurrency issues.

Can anyone shed some light here?

christothes commented 1 year ago

@fvilches17 Would you mind opening a separate issue for this that includes repro steps?

flexwie commented 1 year ago

I'm also using the CLI credential to authenticate to a key vault and add its secrets to the configuration of an ASP.NET Core app:

 var azureServiceTokenProvider = new AzureServiceTokenProvider();
 var keyVaultClient = new KeyVaultClient(
      new KeyVaultClient.AuthenticationCallback(
          azureServiceTokenProvider.KeyVaultTokenCallback));

builder.Configuration.AddAzureKeyVault(
    $"https://{builder.Configuration["ApplicationSettings:KeyVaultName"]}.vault.azure.net/",
     keyVaultClient,
     new DefaultKeyVaultSecretManager());

When using the cli with auto-upgrade enabled it just blocked the whole application indefinitely without throwing an exception. After I disabled auto-upgrade it worked flawless again.

StephenWeatherford commented 1 year ago

To configure logging:

az config set core.log_level=debug
az config set logging.enable_log_file=yes
az config set logging.enable_log_file=~/az-logs

The last line should be this: az config set logging.log_dir=~/az-logs

SierraNL commented 5 months ago

I ended up at this issue while looking into these timeout messages, but the real cause was a lack of caching when using Microsoft.Extensions.Logging.ApplicationInsights in combination with an AzureCliCredential for AAD authentication.

cvietor commented 5 months ago

Same issue here, my application wont start locally because it cannot authenticate to azure services, which are configured with DefaultAzureCredentials (so, azure cli locally):

`Unhandled exception. Azure.Identity.CredentialUnavailableException: DefaultAzureCredential failed to retrieve a token from the included credentials. See the troubleshooting guide for more information. https://aka.ms/azsdk/net/identity/defaultazurecredential/troubleshoot

Luckily found this GH issue, this is exactly the problem. upgrading azure cli or disabling auto_upgrade solves the problem

bhartman353 commented 4 months ago

I always wondered why the cli broke every time a new version was released. Best solution is to just disable auto upgrade since it doesn't really auto upgrade; just prompts you every single time.

az config set auto-upgrade.enable=no

robertreemsrivm commented 2 months ago

Hi, just to bump this with our details:

Using PowerShell 7 on our Self-hosted agents has the same issue. A first attempt throws the issue:

Error | BCP192: Unable to restore the artifact with reference | "br:rivmbicepregistrycr.azurecr.io/bicep/modules/res/subnets:1.0.9656": | Unhandled exception: Azure.Identity.AuthenticationFailedException: The | ChainedTokenCredential failed due to an unhandled exception: Azure CLI | authentication timed out. ---> | Azure.Identity.AuthenticationFailedException: Azure CLI authentication | timed out. at | Azure.Identity.AzureCliCredential.RequestCliAccessTokenAsync(Boolean | async, TokenRequestContext context, CancellationToken cancellationToken) | at Azure.Identity.AzureCliCredential.GetTokenImplAsync(Boolean async, | TokenRequestContext requestContext, CancellationToken cancellationToken) | at Azure.Identity.CredentialDiagnosticScope.FailWrapAndThrow(Exception | ex, String additionalMessage, Boolean isCredentialUnavailable) at | Azure.Identity.AzureCliCredential.GetTokenImplAsync(Boolean async, | TokenRequestContext requestContext, CancellationToken cancellationToken) | at Azure.Identity.AzureCliCredential.GetTokenAsync(TokenRequestContext | requestContext, CancellationToken cancellationToken) at | Azure.Identity.ChainedTokenCredential.GetTokenImplAsync(Boolean async, | TokenRequestContext requestContext, CancellationToken cancellationToken) | --- End of inner exception stack trace --- at | Azure.Identity.ChainedTokenCredential.GetTokenImplAsync(Boolean async, | TokenRequestContext requestContext, CancellationToken cancellationToken) | at | Azure.Identity.ChainedTokenCredential.GetTokenAsync(TokenRequestContext | requestContext, CancellationToken cancellationToken) at | Azure.Containers.ContainerRegistry.ContainerRegistryRefreshTokenCache.GetRefreshTokenFromCredentialAsync(TokenRequestContext context, String service, Boolean async, CancellationToken cancellationToken) at Azure.Containers.ContainerRegistry.ContainerRegistryRefreshTokenCache.GetAcrRefreshTokenAsync(HttpMessage message, TokenRequestContext context, String service, Boolean async) at Azure.Containers.ContainerRegistry.ContainerRegistryRefreshTokenCache.GetAcrRefreshTokenAsync(HttpMessage message, TokenRequestContext context, String service, Boolean async) at Azure.Containers.ContainerRegistry.ContainerRegistryChallengeAuthenticationPolicy.AuthorizeRequestOnChallengeAsyncInternal(HttpMessage message, Boolean async) at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory1 pipeline, Boolean async) at Azure.Core.Pipeline.RedirectPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory1 pipeline, Boolean async) at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory1 pipeline, Boolean async) at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory1 pipeline, Boolean async) at Azure.Containers.ContainerRegistry.ContainerRegistryRestClient.GetManifestAsync(String name, String reference, String accept, CancellationToken cancellationToken) at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.GetManifestInternalAsync(String reference, Boolean async, CancellationToken cancellationToken) at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.GetManifestAsync(String tagOrDigest, CancellationToken cancellationToken) at Bicep.Core.Registry.AzureContainerRegistryManager.DownloadManifestAndLayersAsync(IOciArtifactReference artifactReference, ContainerRegistryContentClient client) at Bicep.Core.Registry.AzureContainerRegistryManager.<>c__DisplayClass4_0.<g__DownloadManifestInternalAsync|0>d.MoveNext() --- End of stack trace from previous location --- at Bicep.Core.Registry.AzureContainerRegistryManager.PullArtifactAsync(RootConfiguration configuration, IOciArtifactReference artifactReference) at Bicep.Core.Registry.OciArtifactRegistry.TryRestoreArtifactAsync(RootConfiguration configuration, OciArtifactReference reference)

[error]PowerShell exited with code '1'

A retry works. But after setting az config set auto-upgrade.enable=no as suggested by @bhartman353 works without a retry.

We're using an Ubuntu 20.04 self hosted agent.