Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.93k stars 440 forks source link

[BUG] Azure Function with UserAssigned ManagedIdentity has a 16% chance to result in Azure.Identity.CredentialUnavailableException #10238

Open jsquire opened 3 months ago

jsquire commented 3 months ago

Issue Transfer

This issue has been transferred from the Azure SDK for .NET repository, #44693.

Please be aware that @nols-neulsen is the author of the original issue and include them for any questions or replies.

Azure SDK triage

The error indicates that the local managed identity endpoint on the host is unavailable or inaccessible to HTTP traffic when the application starts running and the Identity library attempts to acquire a token. This is not something that the credential or the application has insight into nor influence over. This requires investigation of host environment.

Details

Describe the bug

I have a Windows hosted Function App (Consumption plan) with a single HTTP trigger function. This function will initialize an ArmClient, using ManagedIdentityCredential, to spawn Container App Jobs. From a test (902 invocations) this function only succeeds 84% of the time, the other 16% fails due to Azure.Identity.CredentialUnavailableException. Running locally, everything works 100% of the time if I provide a AzureCliCredential, VisualStudioCredential (with Sync is active) seems to also not work all the time.

Function App:

Packages:

<PackageReference Include="Azure.Identity" Version="1.12.0" />
<PackageReference Include="Azure.ResourceManager.AppContainers" Version="1.1.1" />
<PackageReference Include="Microsoft.Azure.Functions.Worker" Version="1.22.0" />
<PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Http" Version="3.2.0" />
<PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Http.AspNetCore" Version="1.3.2" />
<PackageReference Include="Microsoft.Azure.Functions.Worker.Sdk" Version="1.17.2" />
<PackageReference Include="Microsoft.ApplicationInsights.WorkerService" Version="2.22.0" />
<PackageReference Include="Microsoft.Azure.Functions.Worker.ApplicationInsights" Version="1.2.0" />

User Assigned Managed Identity role assignments:

"assignableScopes": [
    "/subscriptions/<obfuscated>",
    "/subscriptions/<obfuscated>/resourceGroups/<obfuscated>"
],
"permissions": [
    {
        "actions": [
            "Microsoft.Resources/subscriptions/read",
            "Microsoft.Resources/subscriptions/resourceGroups/read",
            "microsoft.app/jobs/read",
            "microsoft.app/jobs/stop/action",
            "microsoft.app/jobs/start/action"
        ],
        "notActions": [],
        "dataActions": [],
        "notDataActions": []
    }
]

Code:

    try
    {
        var userManagedIdentityId = Environment.GetEnvironmentVariable("AZURE_CLIENT_ID"); ArgumentException.ThrowIfNullOrEmpty(userManagedIdentityId);
        var resourceIdString = Environment.GetEnvironmentVariable(...); ArgumentException.ThrowIfNullOrEmpty(resourceIdString);
        var environment = Environment.GetEnvironmentVariable("Environment");

        ...
        var resourceId = new ResourceIdentifier(resourceIdString);
        var subscriptionId = resourceId.SubscriptionId;

        ArmClient armClient;
        switch (environment)
        {
            case "NPRD":
                ...
                armClient = new ArmClient(new ManagedIdentityCredential(userManagedIdentityId), subscriptionId);
                break;
            case "CN":
                ...
                armClient = new ArmClient(
                    new ManagedIdentityCredential(userManagedIdentityId, new TokenCredentialOptions { AuthorityHost = AzureAuthorityHosts.AzureChina }), 
                    subscriptionId, 
                    new ArmClientOptions { Environment = ArmEnvironment.AzureChina });
                break;
            default:
                ...
                armClient = new ArmClient(new DefaultAzureCredential(), subscriptionId); // Pick any available credential, info https://learn.microsoft.com/en-us/dotnet/api/azure.identity.defaultazurecredential?view=azure-dotnet
                break;
        }

        var caj = armClient.GetContainerAppJobResource(resourceId);
        ...
        var template = ...;
        await caj.StartAsync(Azure.WaitUntil.Started, template);

        return ...;
    }
    catch (Exception ex)
    {
        _logger.LogInformation("{Message}", ex.Message);
        _logger.LogInformation("{StackTrace}", ex.StackTrace);
        throw;
    }
}

Error:

Azure.Identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. 
Multiple attempts failed to obtain a token from the managed identity endpoint.

System.Net.Sockets.SocketException (10013): An attempt was made to access a socket in a way forbidden by its access permissions.
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)

Expected behavior

Retrieving the credential succeeds 100%

Actual behavior

In 16% of the cases the execution fails due to Azure.Identity.CredentialUnavailableException

Reproduction Steps

Hosting info and code provided in bug description

1oglop1 commented 2 months ago

@jsquire is this related to? https://github.com/Azure/azure-functions-host/issues/8037 I just wasted over 8 hours debugging why my user-assigned identity does not have permissions, until I "randomly" stumbled up on this.

https://github.com/Azure/azure-sdk-for-js/blob/65faad76f8091d2e1ce7deca3b79e030347f93ea/sdk/identity/identity/samples/AzureIdentityExamples.md?plain=1#L146-L164

As mentioned, I need to set the property managedIdentityClientId or AZURE_CLIENT_ID variable to use the managed identity.

IMO I'd have much better dev experience if I did not need to fiddle with any environment variables or anything during new identity.DefaultAzureCredential. Having written over a thousand AWS Lambdas and GCP functions, I expected the environment to contain all necessary data so that the SDK could "just work". I'd expect the requirement to set the variable only in case, there is more than one identity assigned and the ability to retrieve the necessary IDs from the runtime, similarly to https://learn.microsoft.com/en-us/azure/virtual-machines/instance-metadata-service?tabs=windows.

PS. In case this is not the right place, please redirect me.

jsquire commented 2 months ago

@1oglop1 : You'll need to address that question to a member of the Functions team, who own the Functions host environment. This is the correct repository for those conversations, which is why I transferred this issue here.