crossplane-contrib / provider-upjet-azure

Official Azure Provider for Crossplane by Upbound.
Apache License 2.0
52 stars 69 forks source link

[Bug]: Too many Federated Identity Credential requests #689

Open KarlisAG opened 3 months ago

KarlisAG commented 3 months ago

Is there an existing issue for this?

Affected Resource(s)

resources.azure.upbound.io/v1beta1 - ResourceGroupTemplateDeployment

Resource MRs required to reproduce the bug

No response

Steps to Reproduce

  1. Have ResourceGroupTemplateDeployment that creates multiple Microsoft.ManagedIdentity/userAssignedIdentities/federatedIdentityCredentials (we have aprox. 300) and have it be deployed within AKS cluster
  2. Have the same in multiple other AKS clusters in the same tenant, subscription and resource group

What happened?

In some cases it works as expected and all federated credentials are made. But because we have multiple clusters and many requests sometimes one of federated credential creation fails leading to the whole deployment failing which leads to the remaining federated credentials not being made and then failing our other dependencies.

Relevant Error Output Snippet

In failed cases the following error appears in Azure:
{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"ConcurrentFederatedIdentityCredentialsWritesForSingleManagedIdentity","message":"Too many Federated Identity Credentials are written concurrently for the managed identity '/subscriptions/xxxx/resourcegroups/xxxx/providers/microsoft.managedidentity/userassignedidentities/xxxx'. Concurrent Federated Identity Credentials writes under the same managed identity are not supported."}]}

And when in Azure looking up "Related events" for these failed ones we also noticed the following error:
"Error code: 429,
Message: Request is temporarily throttled because subscription xxxx has issued too many requests. Retry after 2 seconds."

Crossplane Version

v1.14.5

Provider Version

v0.42.0

Kubernetes Version

1.27.7

Kubernetes Distribution

AKS

Additional Info

I have a question - is there a way we can nicely avoid or spread out these requests? Current workaround, to not be blocked, was to make another such ResourceGroupTemplateDeployment with different name to force it to deploy again and that usually worked, if not, then we did couple more retries until it did. We also thought of adding timestamp to ResourceGroupTemplateDeployment, but then realized it probably wouldn't work as the Helm chart that manages it won't be remaking it by itself. Are there any suggestions on what we could do about this?

turkenf commented 2 months ago

Hi @KarlisAG, thank you for raising this issue. Could you please try it with the latest provider version and provide us with example MRs if the problem occurs again?