SparebankenVest / azure-key-vault-to-kubernetes

Azure Key Vault to Kubernetes (akv2k8s for short) makes it simple and secure to use Azure Key Vault secrets, keys and certificates in Kubernetes.
https://akv2k8s.io
Apache License 2.0
434 stars 97 forks source link

[BUG] Regression in 1.6.0 Prevents Usage of Custom Registries and/or imagePullSecrets #664

Open Tabrnicus opened 8 months ago

Tabrnicus commented 8 months ago

Note: Make sure to check out known issues (https://akv2k8s.io/troubleshooting/known-issues/) before submitting

Note: Tangentially related to #495, through #631.

Components and versions

[ ] Controller, version: x.x.x (docker image tag) [X] Env-Injector (webhook), version: 1.6.0 (docker image tag) [ ] Other

Describe the bug

A regression in version 1.6.0 of the Env-Injector prevents the use of imagePullSecrets as valid registry authentication when using authType="azureCloudConfig". More specifically, the logic present in the current 1.6.0 version (covered below in detail) assumes that both the Key Vault and container registry must be using MSIs as the authentication method. My use case involves utilizing a third-party container registry, and not an ACR. As a result, I cannot set up any form of MSI to our container registry, which is why we use imagePullSecrets. My understanding from reading through the documentation and code from 1.5.0 and below is that this use case should be supported.

After investigating, I've found that the changes introduced by #631 break the previous logic that allowed imagePullSecrets. The function getContainerRegistryRemoteOptions() in registry.go (a) uses authType to decide how the container registry authentication should be handled, and (b) assumes that an ACR will be used when authType="azureCloudConfig": https://github.com/SparebankenVest/azure-key-vault-to-kubernetes/blob/647b7156685907bb1c5257c9e0596f4eab355165/pkg/docker/registry/registry.go#L123-L182

For context, the previous approach used a getImageConfig() function whose only branch includes imagePullSecrets (for clarity, this same logic is preserved in the default case statement in 1.6.0): https://github.com/SparebankenVest/azure-key-vault-to-kubernetes/blob/7141e5f8cf03d482a51ef3c4a4ae945e9761b14e/pkg/docker/registry/registry.go#L112-L158

This seems to be a restrictive approach, for several reasons:

  1. authType seems to be documented as an option to specify key vault authentication only. I haven't seen this option mention container registries (Chart README.md, Chart env-injector-deployment.yaml). Aside from the issues being discussed here, this alone is cause for potential confusion.
  2. Making the container registry authentication option tied to the Key Vault authentication option is unnecessarily restrictive. Aside from the current use case, another use case that is excluded is that of custom authentication (i.e. authType="environment") with ACR MSI authentication. The newly added code branch in registry.go will never be taken because authType does not equal azureCloudConfig.

To Reproduce

[!IMPORTANT] For ease of testing within the context of this project, these replication steps will include creating an ACR. To be clear, the two use cases (ACR without MSI, and third party registry entirely) fail the same way and for the purposes of this issue can be regarded as equivalent.

[!NOTE] I've replaced all the instances of identifiable names with their <pseudo> equivalents. Example: <registryName>.azurecr.io

  1. Provision an AKS Cluster
    • Kubernetes Version: 1.26.10
    • Tier: Free
    • AAD Pod Identity: Disabled
    • Authentication and Authorization: Local accounts with Kubernetes RBAC
    • Networking: Azure CNI, Calico
  2. Provision a Key Vault
    • With access to the cluster via an access policy on the agentpool MSI
    • Add a secret to this vault to be accessed later by a workload pod
      • For the purposes of these steps, name it admin-password
  3. Provision a container registry. We'll be using token based authentication, not MSIs.
    • This can be done using an ACR, following the instructions to create a token.
    • Generate a new token:
      • Name: mytoken
      • Scope Map: _repositories_push_metadata_write
      • After creation, generate a password. Save the password for later and copy the docker login command for the next step.
  4. Push an image to the registry (we'll pull this image again later):
    docker login -u mytoken -p <password> <registryName>
    docker pull alpine:latest
    docker tag alpine:latest <registryName>/alpine-test:latest
    docker push <registryName>/alpine-test:latest
    docker logout <registryName>
  5. Deploy akv2k8s at application version 1.6.0 (Helm chart version 2.6.0)
    helm upgrade --install akv2k8s spv-charts/akv2k8s --namespace akv2k8s --version 2.6.0 --set global.logLevel=trace
  6. Create / modify a namespace to have the proper azure-key-vault-env-injection annotation
    • I picked default, for ease.
  7. Create an AzureKeyVaultSecret in that namespace
    apiVersion: spv.no/v2beta1
    kind: AzureKeyVaultSecret
    metadata:
      name: admin-password
    spec:
      vault:
        name: <keyVaultName>
        object:
          name: admin-password
          type: secret
  8. Create a Kubernetes Secret with the token information
    kubectl create secret docker-registry regcred --docker-server="<registryName>.azurecr.io" --docker-username="mytoken" --docker-password="<pwFromStep3>"
  9. Deploy a workload pod that uses the pull secret and requests the secret variable with admin-password@azurekeyvault
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: alpine-deployment
      labels:
        app: alpine
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: alpine
      template:
        metadata:
          labels:
            app: alpine
        spec:
          containers:
          - name: alpine
            image: <registryName>.azurecr.io/alpine-test:latest
            imagePullPolicy: Always  # This is here to eliminate node cache from the variables in this test
            env:
              - name: "MY_SECRET"
                value: "admin-password@azurekeyvault"
          imagePullSecrets:
            - name: regcred
  10. Notice that Pods are not deploying. Describe the ReplicaSet to show the unauthorized message:

    $ kubectl describe rs alpine-deployment-<hash>
    
    Events:
      Type     Reason        Age              From                   Message
      ----     ------        ----             ----                   -------
      Warning  FailedCreate  13s              replicaset-controller  Error creating: Internal error occurred: failed calling webhook "pods.env-injector.admission.spv.no": failed to call webhook: an error on the server ("{\"response\":{\"uid\":\"4946a53f-76c1-4f90-82ab-ab0f76272a79\",\"allowed\":false,\"status\":{\"metadata\":{},\"status\":\"Failure\",\"message\":\"failed to get auto cmd, error: cannot fetch image descriptor: GET https://<registryName>.azurecr.io/oauth2/token?scope=repository%3Aalpine-test%3Apull\\u0026service=<registryName>.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information.\"}}}") has prevented the request from succeeding

Expected behavior

Based on the above replication steps, I would expect to see the env-injector use the imagePullSecrets in the deployment specification to pull the image from the registry. Instead, it tries to pull it using invalid/anonymous credentials.

As a point of comparison, installing akv2k8s at application version 1.5.0 (chart version 2.5.0) and reinstalling the deployment will show the intended behavior. Of course, the container is nonfunctional (it crash loops because it's default Alpine), which is beside the point.

Logs

Full ReplicaSet describe log:

Name:           alpine-deployment-bbd99dbf5
Namespace:      default
Selector:       app=alpine,pod-template-hash=bbd99dbf5
Labels:         app=alpine
                pod-template-hash=bbd99dbf5
Annotations:    deployment.kubernetes.io/desired-replicas: 1
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/alpine-deployment
Replicas:       0 current / 1 desired
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=alpine
           pod-template-hash=bbd99dbf5
  Containers:
   alpine:
    Image:      <registry>.azurecr.io/alpine-test:latest
    Port:       <none>
    Host Port:  <none>
    Environment:
      MY_SECRET:  admin-password@azurekeyvault
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type             Status  Reason
  ----             ------  ------
  ReplicaFailure   True    FailedCreate
Events:
  Type     Reason        Age               From                   Message
  ----     ------        ----              ----                   -------
  Warning  FailedCreate  16s               replicaset-controller  Error creating: Internal error occurred: failed calling webhook "pods.env-injector.admission.spv.no": failed to call webhook: an error on the server ("{\"response\":{\"uid\":\"e0ac9a89-737b-4a68-b348-4d4017c09858\",\"allowed\":false,\"status\":{\"metadata\":{},\"status\":\"Failure\",\"message\":\"failed to get auto cmd, error: cannot fetch image descriptor: GET https://<registry>.azurecr.io/oauth2/token?scope=repository%3Aalpine-test%3Apull\\u0026service=<registry>.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information.\"}}}") has prevented the request from succeeding
# <snip> (the same event is repeated about 20 times)

env-injector logs at time of failure:

2024/01/09 06:10:51 [DEBUG] reviewing request 67fe5091-2a8f-4239-a599-d2236f22a837, named: default/
I0109 06:10:51.012438       1 main.go:139] "found pod to mutate" pod="default/"
I0109 06:10:51.012454       1 pod.go:295] "creating client certificate to use with auth service" default/="(MISSING)"
I0109 06:10:51.012462       1 clientCert.go:25] "creating x509 key pair for ca cert and key"
I0109 06:10:51.012752       1 clientCert.go:32] "parse certificate"
I0109 06:10:51.012772       1 clientCert.go:38] "generating client key"
I0109 06:10:51.029883       1 clientCert.go:44] "generating serial number"
I0109 06:10:51.029909       1 clientCert.go:66] "crating x509 certificate"
I0109 06:10:51.032023       1 pod.go:303] "create authentication service secret" default/="(MISSING)"
I0109 06:10:51.045707       1 round_trippers.go:553] POST https://10.0.0.1:443/api/v1/namespaces/default/secrets 409 Conflict in 13 milliseconds
I0109 06:10:51.051601       1 round_trippers.go:553] PUT https://10.0.0.1:443/api/v1/namespaces/default/secrets/akv2k8s-alpine-deployment 200 OK in 5 milliseconds
I0109 06:10:51.052088       1 pod.go:317] "mutate init-containers" default/="(MISSING)"
I0109 06:10:51.052106       1 pod.go:323] "mutate containers" default/="(MISSING)"
I0109 06:10:51.052113       1 pod.go:138] "found container to mutate" container="default/alpine"
I0109 06:10:51.052119       1 pod.go:141] "checking for env vars to inject" container="default/alpine"
I0109 06:10:51.052145       1 pod.go:144] "found env var to inject" env="admin-password@azurekeyvault" container="default/alpine"
I0109 06:10:51.052170       1 registry.go:30] "getting container command for container" container="default/alpine"
I0109 06:10:51.052232       1 registry.go:36] "no cmd override in kubernetes for container, checking docker image configuration for entrypoint and cmd" image="<registryName>.azurecr.io/alpine-test:latest" container="default/alpine"
I0109 06:10:51.052256       1 registry.go:130] "using registry" imageRegistry="<registryName>.azurecr.io"
I0109 06:10:51.052279       1 registry.go:135] "using cloudConfig for registry authentication" config.authType="azureCloudConfig"
I0109 06:10:51.052319       1 acr.go:78] using managed identity for acr credentials
I0109 06:10:51.052349       1 provider.go:285] "azure: using managed identity extension to retrieve access token" id="076f5645-0759-4528-b014-7cf8ac51f90f"
I0109 06:10:51.052359       1 provider.go:292] "azure: using managed identity extension to retrieve access token" id="076f5645-0759-4528-b014-7cf8ac51f90f"
I0109 06:10:51.063450       1 acr.go:184] "discovering auth redirects" url="<registryName>.azurecr.io"
I0109 06:10:51.096246       1 acr.go:190] exchanging an acr refresh_token
I0109 06:10:51.424071       1 round_trippers.go:553] GET https://10.0.0.1:443/api/v1/namespaces/default/serviceaccounts/default 200 OK in 3 milliseconds
E0109 06:10:51.657377       1 main.go:162] "failed to mutate" err="failed to get auto cmd, error: cannot fetch image descriptor: GET https://<registryName>.azurecr.io/oauth2/token?scope=repository%3Aalpine-test%3Apull&service=<registryName>.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information." pod="default/"
2024/01/09 06:10:51 [ERROR] admission webhook error: failed to get auto cmd, error: cannot fetch image descriptor: GET https://<registryName>.azurecr.io/oauth2/token?scope=repository%3Aalpine-test%3Apull&service=<registryName>.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information.

Note the two new (as of #631) INFO logs near the bottom, as it helps to track code execution.

Additional context

In terms of potential direction for resolution, I'd probably be most in favor of the following:

Speeddymon commented 8 months ago

@181192 please take a look - seems this one requires some additional attention.

CC: @abhilashjoseph as the author of the PR which broke this functionality.

flo-kn commented 4 weeks ago

Hi everyone 👋 any news this in the meantime?

I just bumped into same issue trying pull container images from a third-party registry using imagePullSecrets. Would be super happy for any pointers, directions, or workarround ideas.