Azure / kubernetes-keyvault-flexvol

Azure keyvault integration with Kubernetes via a Flex Volume
MIT License
253 stars 83 forks source link

Inconsistent Errors and Propagation Across Deployments #109

Closed JasonKAls closed 5 years ago

JasonKAls commented 5 years ago

Hello,

I've been working with MS support engineers for awhile now trying to get FlexVolumes mounted to several pods to integrate KeyVault and AKS (my K8s cluster). I've run into several problems that have caused confusion in understanding the underlining problems and their solutions. One of the biggest issues is how after several hours of waiting for FlexVols to mount it seems to spontaneously work. However, it'll only work for some pods and not others even when the configuration is basically the same for all of them. Below is a detailed description of the errors I've received and tried to solve and how I'm currently trying to use this setup:

Steps To Reproduce

Flexvolume Setup:

kind: DaemonSet
metadata:
  labels:
    app: keyvault-flexvolume
  name: keyvault-flexvolume
  namespace: *********
spec:
  template:
    metadata:
      labels:
        app: keyvault-flexvolume
    spec:
      tolerations:
      containers:
      - name: flexvol-driver-installer
        image: "mcr.microsoft.com/k8s/flexvolume/keyvault-flexvolume:v0.0.10"
        imagePullPolicy: Always
        resources:
          requests:
            cpu: 50m
            memory: 100Mi
          limits:
            cpu: 50m
            memory: 100Mi
        env:
        - name: TARGET_DIR
          value: "/etc/kubernetes/volumeplugins"
        volumeMounts:
        - mountPath: "/etc/kubernetes/volumeplugins"
          name: volplugins
      volumes:
      - hostPath:
          path: "/etc/kubernetes/volumeplugins"
        name: volplugins
      nodeSelector:
        beta.kubernetes.io/os: linux

Pod Identity Setup:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: aad-pod-id-nmi-service-account
  namespace: ********
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: azureassignedidentities.aadpodidentity.k8s.io
spec:
  group: aadpodidentity.k8s.io
  version: v1
  names:
    kind: AzureAssignedIdentity
    plural: azureassignedidentities
  scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: azureidentitybindings.aadpodidentity.k8s.io
spec:
  group: aadpodidentity.k8s.io
  version: v1
  names:
    kind: AzureIdentityBinding
    plural: azureidentitybindings
  scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: azureidentities.aadpodidentity.k8s.io
spec:
  group: aadpodidentity.k8s.io
  version: v1
  names:
    kind: AzureIdentity
    singular: azureidentity
    plural: azureidentities
  scope: Namespaced
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: aad-pod-id-nmi-role
rules:
- apiGroups: ["apiextensions.k8s.io"]
  resources: ["customresourcedefinitions"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
- apiGroups: ["aadpodidentity.k8s.io"]
  resources: ["azureidentitybindings", "azureidentities"]
  verbs: ["get", "list"]
- apiGroups: ["aadpodidentity.k8s.io"]
  resources: ["azureassignedidentities"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: aad-pod-id-nmi-binding
  labels:
    k8s-app: aad-pod-id-nmi-binding
subjects:
- kind: ServiceAccount
  name: aad-pod-id-nmi-service-account
  namespace: **********
roleRef:
  kind: ClusterRole
  name: aad-pod-id-nmi-role
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    kubernetes.io/cluster-service: "true"
    component: nmi
    tier: node
    k8s-app: aad-pod-id
  name: nmi
  namespace: ***********
spec:
  template:
    metadata:
      labels:
        component: nmi
        tier: node
    spec:
      serviceAccountName: aad-pod-id-nmi-service-account
      hostNetwork: true
      volumes:
      - hostPath:
          path: /run/xtables.lock
          type: FileOrCreate
        name: iptableslock
      containers:
      - name: nmi
        image: "mcr.microsoft.com/k8s/aad-pod-identity/nmi"
        imagePullPolicy: Always
        args:
          - nmi
          - "--host-ip=$(HOST_IP)"
          - "--node=$(NODE_NAME)"
        env:
          - name: HOST_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
        securityContext:
          privileged: true
          capabilities:
            add:
            - NET_ADMIN
        volumeMounts:
        - mountPath: /run/xtables.lock
          name: iptableslock
      nodeSelector:
        beta.kubernetes.io/os: linux
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: aad-pod-id-mic-service-account
  namespace: **********
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: aad-pod-id-mic-role
rules:
- apiGroups: ["apiextensions.k8s.io"]
  resources: ["customresourcedefinitions"]
  verbs: ["*"]
- apiGroups: [""]
  resources: ["pods", "nodes"]
  verbs: [ "list", "watch" ]
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create", "patch"]
- apiGroups: ["aadpodidentity.k8s.io"]
  resources: ["azureidentitybindings", "azureidentities"]
  verbs: ["get", "list", "watch", "post"]
- apiGroups: ["aadpodidentity.k8s.io"]
  resources: ["azureassignedidentities"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: aad-pod-id-mic-binding
  labels:
    k8s-app: aad-pod-id-mic-binding
subjects:
- kind: ServiceAccount
  name: aad-pod-id-mic-service-account
  namespace: *************
roleRef:
  kind: ClusterRole
  name: aad-pod-id-mic-role
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    component: mic
    k8s-app: aad-pod-id
  name: mic
  namespace: ************
spec:
  template:
    metadata:
      labels:
        component: mic
    spec:
      serviceAccountName: aad-pod-id-mic-service-account
      containers:
      - name: mic
        image: "mcr.microsoft.com/k8s/aad-pod-identity/mic:1.3"
        imagePullPolicy: Always
        args:
          - mic
          - "--cloudconfig=/etc/kubernetes/azure.json"
          - "--logtostderr"
        volumeMounts:
        - name: k8s-azure-file
          mountPath: /etc/kubernetes/azure.json
          readOnly: true
      volumes:
      - name: k8s-azure-file
        hostPath:
          path: /etc/kubernetes/azure.json
      nodeSelector:
        beta.kubernetes.io/os: linux
---
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentity
metadata:
 name: **************
spec:
 type: 0
 ResourceID: /subscriptions/****************/resourcegroups/************/providers/Microsoft.ManagedIdentity/userAssignedIdentities/**************
 ClientID: ********************
---
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
 name: **************
spec:
 AzureIdentity: **************
 Selector: k8s-secrets
subjects:
- kind: ServiceAccount
  name: aad-pod-id-nmi-service-account
  namespace: ******************
roleRef:
  kind: ClusterRole
  name: aad-pod-id-nmi-role
  apiGroup: rbac.authorization.k8s.io

Pod Volumes Setup:

- name: kvsecrets
         flexVolume:
           driver: "azure/kv"
           options:
             usepodidentity: "true"
             keyvaultname: *****************
             keyvaultobjectnames: jason
             keyvaultobjecttypes: secret
             resourcegroup: *****************
             subscriptionid: ********************
             tenantid: **************

Errors Received:

Expected behavior Pods should successfully mount FlexVolume and have needed KeyVault Secrets mounted.

Key Vault FlexVolume version image: "mcr.microsoft.com/k8s/flexvolume/keyvault-flexvolume:v0.0.10

Access mode: service principal or pod identity I'm using Pod Identity

Kubernetes version

Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.8", GitCommit:"a89f8c11a5f4f132503edbc4918c98518fd504e3", GitTreeState:"clean", BuildDate:"2019-04-23T04:41:47Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"} 

Additional context Please let me know if I can provide more information.

ritazh commented 5 years ago

Thanks for reporting this issue @JasonKAls!

From the logs you have shared, it looks like the flexvolume driver was not able mount the volume as pod identity could not get a successful auth token for key vault failed to get service principal token: nmi response failed with status code: 403. This error The request content has the following duplicate identity ids suggests the MIC is not able to assign the same identity to the same VM. https://github.com/Azure/aad-pod-identity/issues/167

Can you please share the entire pod/deployment yaml with the pod identity label aadpodidbinding: k8s-secrets and the flexvolume definition?

And please share outputs for the following commands so that we can see all the resources using pod identity:

kubectl get azureidentity

kubectl get azureidentitybinding

kubectl get azureassignedidentity

kubectl get pod -o wide

Another thing is please make sure you are using mcr.microsoft.com/k8s/aad-pod-identity/nmi:1.4 as the above image has no tag.

cc @kkmsft @aramase

JasonKAls commented 5 years ago

Thanks for responding, @ritazh!

I'll provide the other details, but the azure get commands for kubectl have never produced anything for me. Even for the environments and pods where FlexVols are working. Any suggestions?

kkmsft commented 5 years ago

Hi @JasonKAls - the issue https://github.com/Azure/aad-pod-identity/issues/167 was reported in the context that there are already previously assigned identities on the node. Is that the case here - do you have some user assigned identities already assigned on the node for some other operation ?

JasonKAls commented 5 years ago

Hello @kkmsft!

The nodes involved are from an AKS agentpool and only have 1 Managed Identity assigned to it.

ritazh commented 5 years ago

@JasonKAls Can you pls verify what is the mechanism that assigned that one managed identity on that agent node? was it pod identity? did you manually assign the identity to that node? Thanks!

JasonKAls commented 5 years ago

by using az identity create

aramase commented 5 years ago

@JasonKAls Thank you for the response. I've a few more questions -

  1. Can you pls confirm the identity was never assigned manually using az vm identity assign?
  2. Did you make any changes the ResourceID in the azureidentity during multiple retry attempts?
  3. Can you pls post the output from az vm identity show --resource-group <rg> --name <vm name>? You can remove the subid from the output and post it here.
JasonKAls commented 5 years ago

Hello @aramase!

Before I answer your questions, you should know @ritazh's suggestion to create Managed IDs per deployment seems to be working accept for one.

  1. So the way we handle Identity creation is I use az identity create. I then need to ask another department to assign the correct permissions to it. az vm identity assign is denied for me.
  2. I do believe I tried recreating the Managed ID, but not for all of the pods.
  3. I have 2 agentpools attached to my per AKS cluster per environment. I'd be happy to provide you the JSON output if you have a more secure method of submitting it to you, but for the record, I only received output for 1 of the 2 AKS agentpool VMs. Let me know the best method to post output and I'd be happy to share. Not sure I feel comfortable posting any ID or reference information here.

Thanks again!

aramase commented 5 years ago

@JasonKAIs Can you send it to me over email - anramase@microsoft.com. Please redact the client id/principal id and sensitive information from the output. I just need to verify the resource ID format as it appears in the output.

JasonKAls commented 5 years ago

Done! For future reference for this ticket, JSON output should look like:

  "principalId": null,
  "tenantId": null,
  "type": "UserAssigned",
  "userAssignedIdentities": {
    "/subscriptions/************************/resourceGroups/*****************/providers/Microsoft.ManagedIdentity/userAssignedIdentities/prod********-nginx": {
      "clientId": "*******************",
      "principalId": "****************************"
    },
    "/subscriptions/**************************/resourceGroups/********************/providers/Microsoft.ManagedIdentity/userAssignedIdentities/prod***********-api": {
      "clientId": "*************************",
      "principalId": "**********************************"
    },
    "/subscriptions************************/resourceGroups/*************************/providers/Microsoft.ManagedIdentity/userAssignedIdentities/prod-*****************-web": {
      "clientId": "*********************************",
      "principalId": "***************************************"
    }
  }
}

to show each ID attached to a VM. I have 3 on this one.

aramase commented 5 years ago

@JasonKAls Thank you for posting the output and also for sending it to me through email. I was able to recreate this issue on my cluster.

The issue was caused because the check to see if an id exists on the node was not case insensitive. The resourceID defined in the AzureIdentity you provided is ResourceID: /subscriptions/****************/resourcegroups/************/providers/Microsoft.ManagedIdentity/userAssignedIdentities/************** and the identity that already existed on the node had the format /subscriptions/************************/resourceGroups/*****************/providers/Microsoft.ManagedIdentity/userAssignedIdentities/prod********. The difference is in the case for the string resourceGroups on the node and resourcegroups in the AzureIdentity definition. The check should have been case insensitive.

The fix for that has been merged (https://github.com/Azure/aad-pod-identity/pull/271) and will be included as part of next release.

JasonKAls commented 5 years ago

Hello @aramase!

That's fantastic news! Thanks for your teams diligent attention to this issue and for providing a temporary solution for me and my team. I look forward to the next release!

aramase commented 5 years ago

@JasonKAls 1.5-rc2 is now available. Please try it out and provide us with valuable feedback towards 1.5 release - https://github.com/Azure/aad-pod-identity/releases/tag/1.5-rc2

Closing this issue now. Please reopen if you have any further issues.

cc @ritazh