Azure / AKS

Azure Kubernetes Service
1.92k stars 284 forks source link

[BUG] Cross tenant role permissions with the latest AKS rollout seem to be broken for managed applications #4215

Closed jmighion closed 3 weeks ago

jmighion commented 1 month ago

Describe the bug Deploying a managed application requires cross tenant role assignments. When AKS is a part of the managed application, we're seeing authorization failures when trying to apply those role assignments.

{"code": "AuthorizationFailed", "message": "The client '66f5b7c0-7300-461b-80db-c3fa9f823a2e' with object id '66f5b7c0-7300-461b-80db-c3fa9f823a2e' does not have authorization to perform action 'Microsoft.Resources/deployments/validate/action' over scope '/subscriptions/16cdd291-e762-40ca-a870-ad6041ca2547/resourcegroups/rg-aaphjrl5kjahl7si-nodepools-eastus/providers/Microsoft.Resources/deployments/managedIdentityRoleAssignment0' or the scope is invalid. If access was recently granted, please refresh your credentials."}

To Reproduce Steps to reproduce the behavior:

  1. Run the provided bash script below. For us, this is typically ran as a deploymentScript from the bicep template, but manually running it in an environment with access would work.
  2. See error

Expected behavior This process has worked for a couple years now. This is done to avoid the ARM cache issue during a deployment by forcing a fresh cache by instantiating a new deployment.

Environment (please complete the following information):

az version
{
  "azure-cli": "2.59.0",
  "azure-cli-core": "2.59.0",
  "azure-cli-telemetry": "1.1.0",
  "extensions": {
    "aks-preview": "0.5.161",
    "interactive": "0.5.3"
  }
}

Additional context Script for above:

#!/usr/bin/env bash
set -e

AKS_NAME=${AKS_NAME:-''}
NODE_POOL_RESOURCE_GROUP=${NODE_POOL_RESOURCE_GROUP:-''}
RESOURCE_GROUP=${RESOURCE_GROUP:-''}
SERVICE_PRINCIPAL_ID=${SERVICE_PRINCIPAL_ID:=''}
SERVICE_PRINCIPAL_SECRET=${SERVICE_PRINCIPAL_SECRET:=''}
SUBSCRIPTION_ID=${SUBSCRIPTION_ID:=''}
TENANT_ID=${TENANT_ID:=''}
USER_ID=${USER_ID:=''}
USER_PRINCIPAL_ID=${USER_PRINCIPAL_ID:=''}

if [[ -z "${AKS_NAME}" ]]; then
    echo "$0: Environment variable AKS_NAME must be specified"
    exit 1
fi

if [[ -z "${NODE_POOL_RESOURCE_GROUP}" ]]; then
    echo "$0: Environment variable NODE_POOL_RESOURCE_GROUP must be specified"
    exit 1
fi

if [[ -z "${RESOURCE_GROUP}" ]]; then
    echo "$0: Environment variable RESOURCE_GROUP must be specified"
    exit 1
fi

if [[ -z "${SERVICE_PRINCIPAL_ID}" ]]; then
    echo "$0: Environment variable SERVICE_PRINCIPAL_ID must be specified"
    exit 1
fi

if [[ -z "${SERVICE_PRINCIPAL_SECRET}" ]]; then
    echo "$0: Environment variable SERVICE_PRINCIPAL_SECRET must be specified"
    exit 1
fi

if [[ -z "${SUBSCRIPTION_ID}" ]]; then
    echo "$0: Environment variable SUBSCRIPTION_ID must be specified"
    exit 1
fi

if [[ -z "${TENANT_ID}" ]]; then
    echo "$0: Environment variable TENANT_ID must be specified"
    exit 1
fi

# Not checking the USER_ID since it can be empty when not deploying in a cross-tenant environment.

if [[ -z "${USER_PRINCIPAL_ID}" ]]; then
    echo "$0: Environment variable USER_PRINCIPAL_ID must be specified"
    exit 1
fi

# https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles
#   "f1a07417-d97a-45cb-824c-7a7467783830" Managed Identity Operator
#   "9980e02c-c2be-4d73-94e8-173b1dc7cf3c" Virtual Machine Contributor
# The full id is needed, not just the name.
ROLE_ASSIGNMENTS=(
    "/subscriptions/${SUBSCRIPTION_ID}/providers/Microsoft.Authorization/roleDefinitions/f1a07417-d97a-45cb-824c-7a7467783830"
    "/subscriptions/${SUBSCRIPTION_ID}/providers/Microsoft.Authorization/roleDefinitions/9980e02c-c2be-4d73-94e8-173b1dc7cf3c"
)

# Note the escaped \$ to avoid the current bash script from trying to expand a variable that doesn't exist.
cat << EOF > managedIdentityRoleAssignment.json
{
    "\$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "principalId": {
            "type": "string"
        },
        "principalType": {
            "type": "string"
        },
        "roleDefinitionId": {
            "type": "string"
        },
        "scope": {
            "type": "string"
        },
        "delegatedManagedIdentityResourceId": {
            "type": "string"
        }
    },
    "resources": [
        {
            "type": "Microsoft.Authorization/roleAssignments",
            "apiVersion": "2021-04-01-preview",
            "name": "[guid(concat(parameters('principalId'),parameters('roleDefinitionId'),parameters('scope')))]",
            "properties": {
                "principalId": "[parameters('principalId')]",
                "principalType": "[parameters('principalType')]",
                "roleDefinitionId": "[parameters('roleDefinitionId')]",
                "scope": "[parameters('scope')]",
                "delegatedManagedIdentityResourceId": "[if(empty(parameters('delegatedManagedIdentityResourceId')),json('null'),parameters('delegatedManagedIdentityResourceId'))]"
            }
        }
    ]
}
EOF

az login --service-principal --username "${SERVICE_PRINCIPAL_ID}" --password "${SERVICE_PRINCIPAL_SECRET}" --tenant "${TENANT_ID}"
az account set --subscription "${SUBSCRIPTION_ID}"

for i in "${!ROLE_ASSIGNMENTS[@]}"; do
    az deployment group create --resource-group "${RESOURCE_GROUP}" --name "managedIdentityRoleAssignment${i}" --template-file managedIdentityRoleAssignment.json --parameters principalId="${USER_PRINCIPAL_ID}" principalType="ServicePrincipal" roleDefinitionId="${ROLE_ASSIGNMENTS[$i]}" scope="${NODE_POOL_RESOURCE_GROUP}" delegatedManagedIdentityResourceId="${USER_ID}"
done
AlftioH commented 1 month ago

There is not any change related with permissions and the AKS Version upgrade.

The affected permission "Microsoft.Resources/deployments/validate/action" is related to ARM and not any other resource provider.

This permission is required to be used for ARM/Bicep templates and is necessary to granularly be added and granted again under the subscription. Cx subscription manager will determine the permissions and scope of each user and element.

Here are the steps to accomplish and grant the permissions for deployments using templates.

To troubleshoot your error message, can you make sure that the client/app that you're using has the correct RBAC role or Microsoft.Resources/deployments/validate/action permission, over the /subscriptions//resourcegroups/ scope? For more info.

https://learn.microsoft.com/en-us/answers/questions/250370/user-doesnt-have-permission-to-create-deployment-a

https://learn.microsoft.com/en-us/azure/templates/microsoft.resources/deployments?pivots=deployment-language-arm-template

https://learn.microsoft.com/en-us/answers/questions/250370/user-doesnt-have-permission-to-create-deployment-a

https://github.com/Azure/Enterprise-Scale/wiki/ALZ-Setup-azure

In the next AKS granular list are shown the required permissions for AKS. Actually, deploying from CLI or portal should succeed. Microsoft.Resources/deployments/ is not part of the list.

https://learn.microsoft.com/en-us/azure/aks/concepts-identity#aks-service-permissions

aldato commented 1 month ago

Hello @AlftioH,

The main issue that we're seeing is that the Node Resource Group that is being created when the AKS is deployed is not getting the proper permissions propagated (Owner, assigned from the Marketplace Managed App Authorization settings), even though they are being assigned correctly in the Managed Resource Group where the AKS is deployed.

This is why we thought there was a regression somewhere in the past release, since it started happening late last week out of nowhere.

jmighion commented 3 weeks ago

We got word from Microsoft that there was a misconfiguration in Azure Managed Application (AMA) service for 7 days. They have fixed the issue that prevented the role assignment for AMA publishers over the nodeResourceGroup.