Azure / azure-cli

Azure Command-Line Interface
MIT License
3.99k stars 2.97k forks source link

az aks create|update --attach-acr makes undue graph api calls that results in `Could not create a role assignment for ACR. Are you an Owner on this subscription?` #18528

Open duckie opened 3 years ago

duckie commented 3 years ago

Describe the bug

Running with a managed identity which is Contributor on the resource group, Owner on the ACR repo, and a cluster with managed identities enabled:

az aks update -g $RESOURCE_GROUP -n $CLUSTER_NAME --attach-acr $ACRNAME

Results in Could not create a role assignment for ACR. Are you an Owner on this subscription?. Though this is documented as an expected behavior of ACR, it is not. It is a bug of Azure CLI that makes a useless call to Graph API to get an information it already has.

Can be reproduced at creation too.

To Reproduce

Follow the information given before.

Expected behavior

The call should succeed in giving the role assignment to the cluster. Proof by workaround, since the expected result can be achieved by doing this:

role_id=$(az role definition list \
  --scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.ContainerRegistry/registries/$ACRNAME" \
  --name AcrPull \
  --query "[0].id" -o tsv)

object_id=$(az aks show \
  -g $RESOURCE_GROUP \
  -n $CLUSTER_NAME \
  --query "identityProfile.kubeletidentity.objectId" -o tsv)

az role assignment create \
  --scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.ContainerRegistry/registries/$ACRNAME" \
  --role "$role_id" \
  --assignee-object-id "$object_id" \
  --assignee-principal-type ServicePrincipal

Which is a direct proof that the limitation is on Azure CLI side and not in ACR product.

Environment summary

azure-cli version 2.25.0-1 installed with official rpm. Running in a machine with image OpenLogic:CentOS:7_8-gen2.

Additional context

This happens because the code path that makes the role assignment does not care whether the identity is a Service Principal or a Managed Identity.

We can see it here in command_modules/acs/custom.py:

def _ensure_aks_acr_role_assignment(cli_ctx,                                                                                                                   
                                    client_id,                                                                                                                 
                                    registry_id,                                                                                                               
                                    detach=False):                                                                                                             
    if detach:                                                                                                                                                 
        if not _delete_role_assignments(cli_ctx,                                                                                                               
                                        'acrpull',                                                                                                             
                                        client_id,                                                                                                             
                                        scope=registry_id):                                                                                                    
            raise CLIError('Could not delete role assignments for ACR. '                                                                                       
                           'Are you an Owner on this subscription?')                                                                                           
        return                                                                                                                                                                      _add_role_assignment(cli_ctx, role, service_principal_msi_id, is_service_principal=True, delay=2, scope=None)                             
    if not _add_role_assignment(cli_ctx,                                                                                                                       
                                'acrpull',                                                                                                                     
                                client_id,                                                                                                                     
                                scope=registry_id):                                                                                                            
        raise CLIError('Could not create a role assignment for ACR. '                                                                                          
                       'Are you an Owner on this subscription?')                                                                                               
    return

The call to _add_role_assignment let the parameter is_service_principal=True, which trickles down to a call to _resolve_object_id , which makes a call to the Graph API. Then failure ensues, since the Managed Identity has not been granted any Graph API authorizations.

But this call is not required since the AKS api already provides this object_id. Therefore, the Graph API call should be avoided and the role assignment can succeed.

duckie commented 3 years ago

Victims of this issue https://github.com/Azure/AKS/issues/1517 might be interested by the workaround.

ghost commented 3 years ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/aks-pm.

Issue Details
**Describe the bug** Running with a managed identity which is **Contributor** on the resource group, **Owner** on the ACR repo, and a cluster with managed identities enabled: ``` az aks update -g $RESOURCE_GROUP -n $CLUSTER_NAME --attach-acr $ACRNAME ``` Results in `Could not create a role assignment for ACR. Are you an Owner on this subscription?`. Though this is [documented as an expected behavior](https://github.com/MicrosoftDocs/azure-docs/issues/64083) of ACR, it is not. It is a bug of Azure CLI that makes a useless call to Graph API to get an information it already has. Can be reproduced at creation too. **To Reproduce** Follow the information given before. **Expected behavior** The call should succeed in giving the role assignment to the cluster. Proof by workaround, since the expected result can be achieved by doing this: ``` role_id=$(az role definition list \ --scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.ContainerRegistry/registries/$ACRNAME" \ --name AcrPull \ --query "[0].id" -o tsv) object_id=$(az aks show \ -g $RESOURCE_GROUP \ -n $CLUSTER_NAME \ --query "identityProfile.kubeletidentity.objectId" -o tsv) az role assignment create \ --scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.ContainerRegistry/registries/$ACRNAME" \ --role "$role_id" \ --assignee-object-id "$object_id" \ --assignee-principal-type ServicePrincipal ``` Which is a direct proof that the limitation is on Azure CLI side and not in ACR product. **Environment summary** `azure-cli` version `2.25.0-1` installed with official rpm. Running in a machine with image `OpenLogic:CentOS:7_8-gen2`. **Additional context** This happens because the code path that makes the role assignment does not care whether the identity is a Service Principal or a Managed Identity. We can see it here in `command_modules/acs/custom.py`: ``` def _ensure_aks_acr_role_assignment(cli_ctx, client_id, registry_id, detach=False): if detach: if not _delete_role_assignments(cli_ctx, 'acrpull', client_id, scope=registry_id): raise CLIError('Could not delete role assignments for ACR. ' 'Are you an Owner on this subscription?') return _add_role_assignment(cli_ctx, role, service_principal_msi_id, is_service_principal=True, delay=2, scope=None) if not _add_role_assignment(cli_ctx, 'acrpull', client_id, scope=registry_id): raise CLIError('Could not create a role assignment for ACR. ' 'Are you an Owner on this subscription?') return ``` The call to `_add_role_assignment` let the parameter `is_service_principal=True`, which trickles down to a call to `_resolve_object_id` , which makes a call to the Graph API. Then failure ensues, since the Managed Identity has not been granted any Graph API authorizations. But this call is not required since the AKS api already provides this `object_id`. Therefore, the Graph API call should be avoided and the role assignment can succeed.
Author: duckie
Assignees: -
Labels: `AKS`, `Service Attention`, `needs-triage`, `question`
Milestone: -
yonzhan commented 3 years ago

route to service team

tdihp commented 3 years ago

Indeed, this error message is very misleading on many levels.

  1. We observe users fail due to a graph API call instead of the actual role assignment call. This creates most confusion as users try and try on resetting ARM permission.
  2. The client doesn't even need owner permission of the subscription. This question assumes a easy but very overkill role. It only actually need: a) User Access Administrator role, and b) only on the specific ACR resource.
miwithro commented 3 years ago

@weinong

larryclaman commented 3 years ago

There's a workaround documented in this issue https://github.com/MicrosoftDocs/azure-docs/issues/77097

tej-rana commented 2 years ago

That workaround uses object_id for the cluster to create a role assignment. The problem at hand is we are unable to create a cluster.

tej-rana commented 2 years ago

Still an issue. Is there a workaround to create a cluster?

miwithro commented 2 years ago

We are working on a solution to address this but it will be Q1/Q2 2022 most likely before it is available. We have a documented work around that we are posting soon.

https://github.com/MicrosoftDocs/azure-docs/issues/77097

norshtein commented 2 years ago

fixed in #20477 and available in Azure CLI 2.31.0