Azure / azure-workload-identity

Azure AD Workload Identity uses Kubernetes primitives to associate managed identities for Azure resources and identities in Azure Active Directory (AAD) with pods.
https://azure.github.io/azure-workload-identity
MIT License
298 stars 94 forks source link

Documentation steps just don't work #771

Open iamandymcinnes opened 1 year ago

iamandymcinnes commented 1 year ago

I've tried this a few different ways and either the documentation has been wrong or things have now changed in the project or when I went right back to basics and just used the az commands to create a cluster and configure workload identity from scratch it still doesn't work either at all or in the way it's described.

Has anyone actually followed the steps described here and got a working cluster with workload identity? https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster

My observerations following the steps documented in the above link...

All was fine until the key vault section, where you just make an assumption there is a keyvault. https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster#create-a-managed-identity-and-grant-permissions-to-access-azure-key-vault

It's only really one command so why not just detail how to create a keyvault if you don't already have one, which as this is a new stack, lets assume you don't. Secondly the commands are all for when you don't have an RBAC enabled keyvault and in fact there is no detail about RBAC, but surely the whole point of using the managed identities is that you will be using roles to grant permissions. The commands all detailed are then about set-policy which obviously doesn't work with an RBAC enabled keyvault. I configured the appropriate role assignments instead at this point.

Next you get on to creating a kubernetes service account, not a massive problem but you always reference kubetctl commands in bash with no powershell alternatives, which is fine but do windows users not get considered (much like the azwi cli installation having only details to brew install on osx).

So I created the service account and federated identity credential and I deployed an application using the latest AzureIdentity package. It would be nice if you at least gave an example here like some hello world app or the azurecli or something.

First note, which differed from my original terraform attempt of configuring this, we obviously have no instruction of installing the helm 3 chart for the workload identity webhook, I assume this is done for us with the --enable-workload-identity flag on the cluster?

Secondly the pods for the webhook in the kube-system namespace were called azure-wi-webhook-controller when using the helm command in my original attempt. However following this document they are called wi-webhook-controller, but all the other references I see in documentation refer to azure-wi-webhook-controller. On an intermediate attempt (I've trashed my cluster a fair few times to check things), I ended up with both azure-wi-webhook-controller pods and wi-webhook-controller pods.

The other documentation you have starting here: https://azure.github.io/azure-workload-identity/docs/installation/managed-clusters.html#azure-kubernetes-service-aks

Sends you to this link to create a cluster with oidc enabled: https://learn.microsoft.com/en-us/azure/aks/cluster-configuration#oidc-issuer In that link there is absolutely no mention of OIDC at all.

So you obviously go to the first document I read from, and then move on to mutating the webhook, which there is no mention of in the first set of instructions I refered to. This tells you to run the helm install steps: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html#helm-3-recommended

If you do that after creating a cluster with --enable-oidc-issuer and --enable-workload-identity you endup with both the azure-wi-webhook-controller pods and wi-webhook-controller pods.

So I'll jump back to the end of the first document now where we are just told to deploy our application, as mentioned I deployed a simple web-api that using an AzureDefaultCredential. I tried it first without specifying a ManagedIdentityClientId, and acutally for the first time in the 3 days I've been trying to get this to work I got a token back....

Previous attempts without a ManagedIdentityClientId defined in the DefaultAzureCredential complained about multiple identities being found.

So I inspected the token, and despite creating a service account and federated identity as per the steps in the original document, this is in fact using the nodepool identity.

I then specified the id of my service account (which is also referenced in the manifest for my pod as per the documentation) and got back to an error I'd seen in a previous attempt: Azure.Identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. The requested identity has not been assigned to this resource.

As mentioned my last attempt followed the documentation here: https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster with the only change being that I created an RBAC enabled cluster and did a role assignment as opposed to an access policy.

Any direction on how I could get this working would be fantastic and I'd also encourage whoever is documenting this to either try to follow the steps by the letter themselves or get someone else to try to follow the steps and see if you get a working cluster with workload identity working. I'm more than happy to give feedback on a set of instructions that should give the intended result if someone can provide them.

Finally conepts https://azure.github.io/azure-workload-identity/docs/concepts.html this is how I envisioned it to work, however as detailed above when I follow the steps, it uses the nodepool identity not the service account we created.

Bafff commented 1 year ago

Yeah, faced some of this during the first try to implement Azure Workload Identity

I was able to get it working at least on the dev environment, but I know your pain about some parts beeing not clear.

So +1 that your points should be rechecked by Azure Team.

Also, I'll add your concerns to recheck also in my todo list before going live with AWI-enabled pods

aramase commented 1 year ago

@iamandymcinnes @Bafff Thank you for the feedback! Have you tried following the steps in the Quick Start here? Do you have feedback on that?

For the docs you're trying, could you open an issue here and I can tag the docs writer to take a look.

meyuviofficial commented 1 year ago

I'm trying to implement the same for the one week. But, I'm unable to implement it so.

mandatory.

This is how my deployment looks ...

Name:             tst-sa-f4644dcc-hc45b
Namespace:        default
Priority:         0
Service Account:  workload-identity-sa
Node:             <NODE URI>
Start Time:       Wed, 24 May 2023 13:13:54 +0530
Labels:           app=tst-sa
                  azure.workload.identity/use=true
                  pod-template-hash=f4644dcc
Annotations:      <none>
Status:           Running
IP:               10.244.1.11
IPs:
  IP:           10.244.1.11
Controlled By:  ReplicaSet/tst-sa-f4644dcc
Containers:
  tst-sac:
    Container ID:   containerd://f1e23197695e6cd16cb0e5fe5145de66e614608739d257c8ceeb2604e10e30b6
    Image:          <Image URI>
    Image ID:       <Docker Image ID>
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 24 May 2023 13:30:14 +0530
      Finished:     Wed, 24 May 2023 13:30:15 +0530
    Ready:          False
    Restart Count:  8
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:     500m
      memory:  128Mi
    Environment:
      AZURE_CLIENT_ID:             <Client ID of the Managed Identity>
      AZURE_TENANT_ID:             <Tenant ID>
      AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
      AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/
    Mounts:
      /var/run/secrets/azure/tokens from azure-identity-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sn6gq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-sn6gq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
  azure-identity-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3600
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  20m                 default-scheduler  Successfully assigned default/tst-sa-f4644dcc-hc45b to aks-agentpool-32279840-vmss000003
  Normal   Pulled     20m                 kubelet            Successfully pulled image "yuvarajselva/azidgo:v1-amd64" in 646.103122ms (646.108722ms including waiting)
  Normal   Pulled     20m                 kubelet            Successfully pulled image "yuvarajselva/azidgo:v1-amd64" in 619.91359ms (619.932591ms including waiting)
  Normal   Pulled     20m                 kubelet            Successfully pulled image "yuvarajselva/azidgo:v1-amd64" in 663.409966ms (663.415766ms including waiting)
  Normal   Started    19m (x4 over 20m)   kubelet            Started container tst-sac
  Normal   Pulled     19m                 kubelet            Successfully pulled image "yuvarajselva/azidgo:v1-amd64" in 611.666647ms (611.672447ms including waiting)
  Normal   Pulling    19m (x5 over 20m)   kubelet            Pulling image "yuvarajselva/azidgo:v1-amd64"
  Normal   Created    19m (x5 over 20m)   kubelet            Created container tst-sac
  Normal   Pulled     19m                 kubelet            Successfully pulled image "yuvarajselva/azidgo:v1-amd64" in 615.435516ms (615.466316ms including waiting)
  Warning  BackOff    43s (x93 over 20m)  kubelet            Back-off restarting failed container tst-sac in pod tst-sa-f4644dcc-hc45b_default(0c6246dc-7986-49e7-8a54-a498b99bc24e)

All I'm trying to do is, list the resource groups using the azidentity golang package.

package main

// Import key modules.
import (
    "context"
    "log"

    "github.com/Azure/azure-sdk-for-go/sdk/azidentity"
    "github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/resources/armresources"
    "k8s.io/klog/v2"
    "os"
)

// Define key global variables.
var (
    subscriptionId = "<SUB ID>"
    ctx = context.Background()
)

// Define the function to create a resource group.

func main() {
    cred, err := azidentity.NewDefaultAzureCredential(nil)
    if err != nil {
        log.Fatalf("Authentication failure: %+v", err)
    }

    // Azure SDK Azure Resource Management clients accept the credential as a parameter
    client, _ := armresources.NewClient(subscriptionId, cred, nil)
    rgClient, _ := armresources.NewResourceGroupsClient(subscriptionId, cred, nil)

    resourceGroup, err := rgClient.Get(ctx, "TST-RG", nil)
    if err != nil {
        klog.Fatal("Error occurred while fetching the resource group")
    }

    klog.InfoS("RG Authenticated ", "ID", resourceGroup.ID, "Name", resourceGroup.Name)
    klog.InfoS("Authenticated !!")
}

Please let me know how to achieve this.

Bafff commented 1 year ago

@yuvarajselva did you configure here needed federated credentials? image

wisercoder commented 2 weeks ago

It is baffling how such an important feature of Azure Kubernetes Service can be left with non-working documentation for over a year and a half. I wasted 3 days trying to get the documented steps to work. I deleted the cluster and re-did the steps 3 or 4 times and I consistently ended up with "Identity not found" error when trying to access my storage account from AKS. How about we start a GoFundMe to raise funds to help Microsoft document this properly?

For what it is worth, I found that instead of creating a new managed identity, if I use the agentpool's identity then it works.