Closed jroskens-mgm closed 1 year ago
@jroskens-mgm I have tried the route you give, and I successfully attached the cluster. Can you check the user assigned identity for the workspace? Besides, according to the last block, I think your extension does not install successfully, you can try to re-install the extension.
@siyuZL I deleted both resource groups and recreated everything following the steps exactly as I have them here. The result was almost the same as before. The only difference being that it took several minutes before I received the error instead of failing immediately.
(BadRequest) AKS role check failed for user assigned identity. Please check the role assignment.
Code: BadRequest
Message: AKS role check failed for user assigned identity. Please check the role assignment.
When you tried to recreate this issue, did you select the identity that is created in step 3?
If I skip this step, and create a ML workspace with a MSI, then I can attach the AKS cluster without any issue. This is also why I believe the extension is installed successfully (in addition to it showing as succeeded).
"extensionType": "microsoft.azureml.kubernetes",
"id": "/subscriptions/<subscriptionid>/resourceGroups/rg-aks/providers/Microsoft.ContainerService/managedClusters/aks-ml-cluster/providers/Microsoft.KubernetesConfiguration/extensions/aml-extension",
"identity": null,
"isSystemExtension": false,
"name": "aml-extension",
"packageUri": null,
"plan": null,
"provisioningState": "Succeeded",
But, in the interest of science I created another ML workspace without specifying a user identity and attempted to attach the same AKS cluster that failed to attach above.
az ml workspace create -n ml-workspace-msi -g rg-ml-workspace \
--set storage_account="$STORAGE_ID" \
key_vault="$KEYVAULT_ID" \
application_insights="$APP_INSIGHTS_ID" \
container_registry="$ACR_ID"
AKS_ID=$(az aks show -g rg-aks -n aks-ml-cluster --query id -otsv)
az ml compute attach --resource-group rg-ml-workspace --workspace-name ml-workspace-msi --type Kubernetes \
--name ml-inference \
--resource-id "$AKS_ID"
The cluster was attached without any issues.
{
"id": "/subscriptions/<subscription id>/resourceGroups/rg-ml-workspace/providers/Microsoft.MachineLearningServices/workspaces/ml-workspace-msi/computes/ml-inference",
"location": "westus",
"name": "ml-inference",
"namespace": "default",
"properties": {
"default_instance_type": "defaultinstancetype",
"extension_instance_release_train": "stable",
"extension_principal_id": "031dcc21-9ee1-4004-b660-25c211f3ca34",
"instance_types": {
"defaultinstancetype": {
"resources": {
"limits": {
"cpu": "2",
"memory": "2Gi",
"nvidia.com/gpu": null
},
"requests": {
"cpu": "0.1",
"memory": "500Mi",
"nvidia.com/gpu": null
}
}
}
},
"namespace": "default"
},
"provisioning_state": "Succeeded",
"resourceGroup": "rg-ml-workspace",
"resource_id": "/subscriptions/<subscription id>/resourcegroups/rg-aks/providers/Microsoft.ContainerService/managedClusters/aks-ml-cluster",
"type": "kubernetes"
}
In the interest of clarity, I want to state that this does not resolve the issue. Attaching an AKS cluster does work correctly when the ML workspace is configured to use a managed system identity. However, I am unable to attach an AKS cluster to an ML workspace configured with a user assigned identity.
Hi @jroskens-mgm, I also checked this scenario twice and find a solution. Can you try to give your workspace's user assigned identity these roles?
Grand Kubernetes Extension Contributor
role to the "aks cluster" or "resource group"
Grand Azure Kubernetes Service Cluster Admin
Role role to the "aks cluster".
@siyuZL - Still seeing the same error. I recreated everything from scratch, including the AKS cluster, and applied those two roles.
az ml compute attach --resource-group rg-ml-workspace --workspace-name ml-workspace --type Kubernetes \
--name ml-inference \
--resource-id "$AKS_ID"
(BadRequest) AKS role check failed for user assigned identity. Please check the role assignment.
Code: BadRequest
Message: AKS role check failed for user assigned identity. Please check the role assignment.
Here are the roles currently assigned to the identity of the ML workspace.
# Get the resource ID of the workspace's user assigned identity principal ID
WORKSPACE_PRINCIPAL_ID=$(az ml workspace show --resource-group rg-ml-workspace --name ml-workspace --query "(identity.user_assigned_identities.*.principal_id)[0]" -otsv)
# Display assigned roles for the Workspace's assigned user
az role assignment list --all --assignee "$WORKSPACE_PRINCIPAL_ID" --query "[].{roleDefinitionName:roleDefinitionName, scope:scope}" -o table
RoleDefinitionName Scope
------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------
Contributor /subscriptions/<subscription id>/resourceGroups/rg-ml-workspace
Contributor /subscriptions/<subscription id>/resourceGroups/rg-ml-workspace/providers/Microsoft.Storage/storageAccounts/samlworkspace20092
Storage Blob Data Contributor /subscriptions/<subscription id>/resourceGroups/rg-ml-workspace/providers/Microsoft.Storage/storageAccounts/samlworkspace20092
Key Vault Administrator /subscriptions/<subscription id>/resourceGroups/rg-ml-workspace/providers/Microsoft.KeyVault/vaults/keyvaultml37041
Contributor /subscriptions/<subscription id>/resourceGroups/rg-ml-workspace/providers/Microsoft.ContainerRegistry/registries/acrml37396
Reader /subscriptions/<subscription id>/resourcegroups/rg-aks/providers/Microsoft.ContainerService/managedClusters/aks-ml-cluster
Kubernetes Extension Contributor /subscriptions/<subscription id>/resourcegroups/rg-aks/providers/Microsoft.ContainerService/managedClusters/aks-ml-cluster
Azure Kubernetes Service Cluster Admin Role /subscriptions/<subscription id>/resourcegroups/rg-aks/providers/Microsoft.ContainerService/managedClusters/aks-ml-cluster
I got it to work. The Workspace Identity must be granted the "Kubernetes Extension Contributor" to the AKS resource group. The cluster alone isn't enough.
It seems these are the minimum roles and scopes that must be added to the ML Workspace's User Assigned Identity in order to attach a cluster successfully.
RoleDefinitionName Scope
------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------
Reader AKS Cluster
Azure Kubernetes Service Cluster Admin Role AKS Cluster
Kubernetes Extension Contributor Resource Group of AKS Cluster
Glad I have it working now. I have to ask though, is this an undocumented requirement or a bug? The Prerequisite documentation lists only the Reader role is required. Needing Cluster Admin is much more than that.
This is an undocumented requirement. We will update the document to fix it. Thank you for the verify @jroskens-mgm!
Thanks siyu for this support, the document has updated, please refer to attach-to-workspace-with-user-assigned-managed-identity.
I am unable to find a working method of attaching an AKS cluster when a ML workspace was provisioned with a user identity.
I followed the documentation under User-assigned managed identity to assign the appropriate roles over the Key Vault, Storage Account, ACR and App Insights resources I created ahead of time. I then assigned the “Reader” role to the identity over the AKS cluster scope as mentioned under the Prerequisite section. Following this documentation does not appear to result in a success however.
I originally attempted this by terraforming all resources, but switched over to using the CLI after opening a support case so it was easier to share the repo steps.
Create VNET
VNET_ID=$(az network vnet create --name vnet-aks --resource-group rg-aks --location westus --address-prefix 10.0.0.0/20 --subnet-name subnet-aks --subnet-prefixes 10.0.0.0/24 --query newVNet.id -otsv) SUBNET_ID=$(az network vnet subnet show -g rg-aks -n subnet-aks --vnet-name vnet-aks --query id -otsv)
Create AKS Control Plane Identity
AKS_PRINCIPAL_ID=$(az identity create -g rg-aks -n identity-aks --query principalId -otsv) AKS_IDENTITY_ID=$(az identity show -g rg-aks -n identity-aks --query id -otsv)
Create Kubelet Identity
KUBELET_ID=$(az identity create -g rg-aks -n identity-kubelet --query id -otsv)
Hack to avoid "Cannot find user or service principal in graph database" which can happen if you try to assign roles immediately after creating the identity
sleep 30
Assign Managed Identity Role to AKS Control Plane Identity over Kubelet identity
az role assignment create --assignee $AKS_PRINCIPAL_ID --role "Managed Identity Operator" --scope "$KUBELET_ID"
Assign Network Contributor to AKS Control Plane / Cluster Identity for subnet aks will be assigned.
az role assignment create --assignee $AKS_PRINCIPAL_ID --role "Network Contributor" --scope "$VNET_ID"
Create the AKS cluster
az aks create \ --resource-group rg-aks \ --name aks-ml-cluster \ --network-plugin kubenet \ --vnet-subnet-id $SUBNET_ID \ --docker-bridge-address 172.17.0.1/16 \ --dns-service-ip 10.2.0.10 \ --service-cidr 10.2.0.0/24 \ --enable-managed-identity \ --assign-identity $AKS_IDENTITY_ID \ --assign-kubelet-identity $KUBELET_ID \ --node-count 1 \ --generate-ssh-keys
Install the k8s-extension
az k8s-extension create --name aml-extension \ --extension-type Microsoft.AzureML.Kubernetes \ --scope cluster \ --cluster-name aks-ml-cluster \ --resource-group rg-aks \ --config enableTraining=True \ enableInference=True \ enableTraining=False \ allowInsecureConnections=True \ inferenceRouterServiceType=loadBalancer \ inferenceRouterHA=false \ internalLoadBalancerProvider=azure \ --cluster-type managedClusters
The above immediately fails with the error:
I eventually got passed the "AKS role check failed" error by assigning both the "Reader" and "Azure Kubernetes Service Cluster Admin Role". I added the admin role because that's what I observed azure doing automatically to it's MSI when attaching. However, this still results in a failure, although it is different.
These issues only seem to occur when bringing your own identity to a ML workspace. If you simply create a ML workspace and allow it to create its own Managed System Identity, then the cluster can be attached without any issues.