Azure / kubernetes-keyvault-flexvol

Azure keyvault integration with Kubernetes via a Flex Volume
MIT License
253 stars 84 forks source link

Volume mount failing but secret being retrieved #68

Closed stevenicholl closed 5 years ago

stevenicholl commented 5 years ago

Currently implementing a test of Azure KV FlexVolumes with AAD PodId in an AKS cluster but experiencing some strange behaviour.

AAD PodId K8's components deployed using helm https://github.com/Azure/aad-pod-identity/tree/master/charts/aad-pod-identity & https://raw.githubusercontent.com/Azure/kubernetes-keyvault-flexvol/master/deployment/kv-flexvol-installer.yaml . Azure components (User Managed ID & access controls) deployed with Terraform. All of the Azure components seem to be fine.

A test pod with a volume mount and a test secret have been deployed using:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: nginx-flex-kv-podid
    aadpodidbinding: aadpodid
  name: nginx-flex-kv-podid
spec:
  containers:
  - name: nginx-flex-kv-podid
    image: nginx
    volumeMounts:
    - name: test
      mountPath: /kv
      readOnly: true
  volumes:
  - name: test
    flexVolume:
      driver: "azure/kv"
      options:
        usepodidentity: "true"
        keyvaultname: "sandbox-kv"
        keyvaultobjectnames: "password"
        keyvaultobjecttypes: "secret"
        keyvaultobjectversions: "bda5b53cd0184fa59cbe683bf6e9e2e9"
        resourcegroup: "sandbox"
        subscriptionid: "XXXXXX-XXXX-XXXX-XXXX-XXXXXXXX"
        tenantid: "XXXXXX-XXXX-XXXX-XXXX-XXXXXXXX"

Things get interesting now.

mic log records the following:

I0206 16:40:00.119882 1 event.go:218] Event(v1.ObjectReference{Kind:"AzureIdentityBinding", Namespace:"default", Name:"azure-identity-binding", UID:"9327f0b1-2a2c-11e9-a16b-e619d6e0ff11", APIVersion:"aadpodidentity.k8s.io/v1", ResourceVersion:"12473313", FieldPath:""}): type: 'Normal' reason: 'binding applied' Binding azure-identity-binding applied on node aks-agentpool-11854310-0 for pod nginx-flex-kv-podid-default-azure-identity

nmi pod logging a 400, Identity not found:

time="2019-02-06T16:37:45Z" level=error msg="failed to get service principal token for pod:default/nginx-flex-kv-podid, adal: Refresh request failed. Status Code = '400'. Response body: {\"error\":\"invalid_request\",\"error_description\":\"Identity not found\"}" req.method=GET req.path=/host/token/ req.remote=127.0.0.1

Expecting to see logs to the effect of 10 & 12 detailed here regarding PodId https://github.com/Azure/aad-pod-identity/tree/master/charts/aad-pod-identity . Save this one for a bit later because it would appear from these logs that any lookup using PodId will fail. It doesn't.

The log related to the failed volume mount is:

Warning FailedMount 59s (x4 over 1m) kubelet, aks-agentpool-11854310-3 MountVolume.SetUp failed for volume "test" : mount command failed, status: Failure, reason: /etc/kubernetes/volumeplugins/azure~kv/azurekeyvault-flexvolume failed, failed to get key vault

Yet kubectl exec -it nginx-flex-kv-podid cat /kv/password successfully returns the correct value of the secret referenced. This suggests the AAD PodId is working fine and that all the permissions set to allow the AKS Service Principle to use the User Managed ID to authenticate to and read from KeyVault are also ok.

Interesting conundrum. Grateful for any input that may help explain this behaviour.

ritazh commented 5 years ago

Yet kubectl exec -it nginx-flex-kv-podid cat /kv/password successfully returns the correct value of the secret referenced.

@stevenicholl If you see this working, then definitely the volume has mounted and the AzureAssignedIdentity has been created.

The entire AzureAssignedIdentity creation process gets triggered when we create the volume at pod creation time. All these components (find identity, assign identity, retrieve token for key vault) will take time to complete. If volume mount fails, the kubelet will retry. Hence you are seeing errors in the log but looks like eventually all the dependent components got created successfully.

stevenicholl commented 5 years ago

Thank you @ritazh

On one hand it is good to get some reassurance that what I was seeing can be explained.

On the other hand, based on the assumption everything has worked because I can retrieve a secret, grateful for any suggestion to prevent the failed mount log against the pod. When describing the pod, the failed mount log would lead anyone to believe the process had failed even though we now know it didn't. Perhaps some way to write to further message of success when it eventually mounts? I'm keen to avoid having error messages that incorrectly describe the state of our pods in production.

ritazh commented 5 years ago

@stevenicholl Can you please share the output of kubectl describe pod <name of the pod with the mounted secret> then kubectl get events. Please redact any sensitive information.

The failed messages will be added to the events until the volume has been successfully mounted. At the end of your events list on the pod, you should see Created container and Started container.

stevenicholl commented 5 years ago

Hi @ritazh

Below is the output for the kubectl describe pod <name of the pod with the mounted secret>

Events:
  Type     Reason       Age   From                               Message
  ----     ------       ----  ----                               -------
  Normal   Scheduled    23s   default-scheduler                  Successfully assigned flexvoltest/nginx-flex-kv-podid to aks-agentpool-11854310-5
  Warning  FailedMount  22s   kubelet, aks-agentpool-11854310-5  MountVolume.SetUp failed for volume "test" : mount command failed, status: Failure, reason: /etc/kubernetes/volumeplugins/azure~kv/azurekeyvault-flexvolume failed, failed to get key vault
  Normal   Pulling      16s   kubelet, aks-agentpool-11854310-5  pulling image "nginx"
  Normal   Pulled       15s   kubelet, aks-agentpool-11854310-5  Successfully pulled image "nginx"
  Normal   Created      14s   kubelet, aks-agentpool-11854310-5  Created container
  Normal   Started      14s   kubelet, aks-agentpool-11854310-5  Started container

On running kubectl get events against the namespace I get: (note this was executed a number of minutes after above so the timings won't perfectly match):

4m52s       Normal    Scheduled     Pod       Successfully assigned flexvoltest/nginx-flex-kv-podid to aks-agentpool-11854310-5
4m51s       Warning   FailedMount   Pod       MountVolume.SetUp failed for volume "test" : mount command failed, status: Failure, reason: /etc/kubernetes/volumeplugins/azure~kv/azurekeyvault-flexvolume failed, failed to get key vault
4m45s       Normal    Pulling       Pod       pulling image "nginx"
4m44s       Normal    Pulled        Pod       Successfully pulled image "nginx"
4m43s       Normal    Created       Pod       Created container
4m43s       Normal    Started       Pod       Started container

In the cluster events I get a corresponding azure-identity-binding applying at exactly the time the container started:

4m43s       Normal    binding applied           AzureIdentityBinding   Binding azure-identity-binding applied on node aks-agentpool-11854310-5 for pod nginx-flex-kv-podid-flexvoltest-azure-identity

If I have understood correctly, this is what you were expecting to see and it confirms the behaviour described ie. it eventually successfully mounts? I do still think the K8's events are a bit misleading in that, whilst we know the secret mounted correctly, it still logs a failure.

ritazh commented 5 years ago

@stevenicholl yes this is the expected behavior following the pod's creation lifecycle. Whenever a volume mount attempt has failed, the kubelet will retry mounting the failed volume and the failed event will be recorded. These events help us debug if any of the volume mounts failed. They are the history of pod, not current state of the pod. If any of the volume mounts failed, then the pod will not be created, its STATUS will be ContainerCreating. As you can see from your pod events, eventually as soon as the volume is mounted successfully, the container is created, then started. The pod's STATUS is Running.

stevenicholl commented 5 years ago

@ritazh , thank you for confirmation. We will proceed on the basis that, if the pods start, the volumes have mounted. If they didn't mount correctly, the pod creation process will not have finished and will be stuck in a ContainerCreating state.

ritazh commented 5 years ago

@stevenicholl That's correct. Closing this issue for now, but feel free to reopen or file new issues if you have additional questions.

stevenicholl commented 5 years ago

Thanks @ritazh , The only other thing I'd be interested to know if it is understood why the mounts initially fail? This only occurs for us when it is a KeyVault FlexVolume, no issues when mounting secrets to pods from Kubernetes secrets.