SparebankenVest / azure-key-vault-to-kubernetes

Azure Key Vault to Kubernetes (akv2k8s for short) makes it simple and secure to use Azure Key Vault secrets, keys and certificates in Kubernetes.
https://akv2k8s.io
Apache License 2.0
437 stars 97 forks source link

[BUG]Environment injection does not work - UNAUTHORIZED: authentication required. #495

Open MarkKharitonov opened 1 year ago

MarkKharitonov commented 1 year ago

Note: Make sure to check out known issues (https://akv2k8s.io/troubleshooting/known-issues/) before submitting

Components and versions Select which component(s) the bug relates to with [X].

[ ] Controller, version: x.x.x (docker image tag) [X] Env-Injector (webhook), version: 1.4.0 (docker image tag) [ ] Other

Describe the bug I created a new AKS cluster and deployed a simple nginx pod. All works well. Then I added a secret injected through the environment and the replicaSet fails to start with the following error:

mark@L-R910LPKW:~/chip/toolbox/k8s [test ≡ +0 ~2 -0 !]$ k describe rs toolbox-78544646dd | tail -1
  Warning  FailedCreate  26s   replicaset-controller  Error creating: Internal error occurred: failed calling webhook "pods.env-injector.admission.spv.no": failed to call webhook: an error on the server ("{\"response\":{\"uid\":\"2e772ecb-e618-42f8-9273-a43a5b17ac52\",\"allowed\":false,\"status\":{\"metadata\":{},\"status\":\"Failure\",\"message\":\"failed to get auto cmd, error: GET https://app541deploycr.azurecr.io/oauth2/token?scope=repository%3Achip%2Ftoolbox%3Apull\\u0026service=app541deploycr.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information.\\ncannot fetch image descriptor\\ngithub.com/SparebankenVest/azure-key-vault-to-kubernetes/pkg/docker/registry.getImageConfig\\n\\t/go/src/github.com/SparebankenVest/azure-key-vault-to-kubernetes/pkg/docker/registry/registry.go:144\\ngithub.com/SparebankenVest/azure-key-vault-to-kubernetes/pkg/docker/registry.(*Registry).GetImageConfig\\n\\t/go/src/github.com/SparebankenVest/azure-key-vault-to-kubernetes/pkg/docker/registry/registry.go:103\\nmain.getContainerCmd\\n\\t/go/src/github.com/SparebankenVest/azure-key-vault-to-kubernetes/cmd/azure-keyvault-secrets-webhook/registry.go:39\\nmain.podWebHook.mutateContainers\\n\\t/go/src/github.com/SparebankenVest/azure-key-vault-to-kubernetes/cmd/azure-keyvault-secrets-webhook/pod.go:143\\nmain.podWebHook.mutatePodSpec\\n\\t/go/src/github.com/SparebankenVest/azure-key-vault-to-kubernetes/cmd/azure-keyvault-secrets-webhook/pod.go:299\\nmain.vaultSecretsMutator\\n\\t/go/src/github.com/SparebankenVest/azure-key-vault-to-kubernetes/cmd/azure-keyvault-secrets-webhook/main.go:163\\ngithub.com/slok/kubewebhook/pkg/webhook/mutating.MutatorFunc.Mutate\\n\\t/go/pkg/mod/github.com/slok/kubewebhook@v0.11.0/pkg/webhook/mutating/mutator.go:25\\ngithub.com/slok/kubewebhook/pkg/webhook/mutating.mutationWebhook.mutatingAdmissionReview\\n\\t/go/pkg/mod/github.com/slok/kubewebhook@v0.11.0/pkg/webhook/mutating/webhook.go:128\\ngithub.com/slok/kubewebhook/pkg/webhook/mutating.mutationWebhook.Review\\n\\t/go/pkg/mod/github.com/slok/kubewebhook@v0.11.0/pkg/webhook/mutating/webhook.go:120\\ngithub.com/slok/kubewebhook/pkg/webhook/internal/instrumenting.(*Webhook).Review\\n\\t/go/pkg/mod/github.com/slok/kubewebhook@v0.11.0/pkg/webhook/internal/") has prevented the request from succeeding
mark@L-R910LPKW:~/chip/toolbox/k8s [test ≡ +0 ~2 -0 !]$

This has all the markers of the issue describe in here - https://akv2k8s.io/installation/with-aad-pod-identity. But trying to fix it as described does not work:

mark@L-R910LPKW:~/chip/toolbox/k8s [test ≡ +0 ~2 -0 !]$ helm -n akv2k8s upgrade akv2k8s akv2k8s/akv2k8s --set addAzurePodIdentityException=true
Error: UPGRADE FAILED: [resource mapping not found for name: "akv2k8s-controller-exception" namespace: "akv2k8s" from "": no matches for kind "AzurePodIdentityException" in version "aadpodidentity.k8s.io/v1"
ensure CRDs are installed first, resource mapping not found for name: "akv2k8s-env-injector-exception" namespace: "" from "": no matches for kind "AzurePodIdentityException" in version "aadpodidentity.k8s.io/v1"
ensure CRDs are installed first]
mark@L-R910LPKW:~/chip/toolbox/k8s [test ≡ +0 ~2 -0 !]$

So it does not work either way.

The AKS cluster is deployed using our terraform code. The AKS cluster version is 1.25.4.

To Reproduce

  1. Deploy AKS cluster at version 1.25.4. I can provide any configuration options as needed.
  2. Deploy akv2k8s using the following terraform resource:
    resource "helm_release" "akv2k8s" {
    name             = "akv2k8s"
    chart            = "akv2k8s"
    version          = "2.3.2"
    create_namespace = true
    namespace        = "akv2k8s"
    repository       = "http://charts.spvapi.no"
    }
  3. Deploy simple nginx app. In our case:
    
    mark@L-R910LPKW:~/chip/toolbox/k8s [test ≡ +0 ~2 -0 !]$ helm get manifest toolbox
    ---
    # Source: chip-toolbox/templates/service.yaml
    apiVersion: v1
    kind: Service
    metadata:
    labels:
    app: toolbox
    name: toolbox
    namespace: chip
    spec:
    ports:
    - port: 80
      protocol: TCP
      targetPort: 80
    selector:
    app: toolbox
    ---
    # Source: chip-toolbox/templates/deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    labels:
    app: toolbox
    name: toolbox
    namespace: chip
    spec:
    replicas: 1
    selector:
    matchLabels:
      app: toolbox
    template:
    metadata:
      labels:
        app: toolbox
    spec:
      containers:
        - name: toolbox
          image: app541deploycr.azurecr.io/chip/toolbox:1.0.23062.13
          env:
          - name: DUMMY_SECRET
            value: dummy@azurekeyvault
    ---
    # Source: chip-toolbox/templates/ingress.yaml
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
    labels:
    app: toolbox
    annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$2
    name: toolbox
    namespace: chip
    spec:
    ingressClassName: nginx-internal
    rules:
    - host: chip-can.np.dayforcehcm.com
      http:
        paths:
          - path: /toolbox(/|$)(.*)
            pathType: Prefix
            backend:
              service:
                name: toolbox
                port:
                  number: 80
    tls:
    - hosts:
    - chip-can.np.dayforcehcm.com
    ---
    # Source: chip-toolbox/templates/akv.yaml
    apiVersion: spv.no/v1
    kind: AzureKeyVaultSecret
    metadata:
    name: secret
    namespace: chip
    spec:
    vault:
    name: c541chip
    object:
      name: dummy
      type: secret

mark@L-R910LPKW:~/chip/toolbox/k8s [test ≡ +0 ~2 -0 !]$


The replicaSet fails. And applying the exception flag fails too.

**Expected behavior**
The replicaSet is able to scale as requested and the secret is injected.

**Logs**
I am not sure which logs to provide, I will provide any logs on demand.

**Additional context**
The terraform code used to deploy the HELM charts used to deploy AAD Pod Identity in the past, but that particular HELM was deleted and was never applied to the new cluster. So it is a mystery to us why it happens in the first place.
yothoon commented 1 year ago

I have exactly the same problem when using version 2.3.2. It is working fine with 2.0.11.

lesscodingmorehappiness commented 1 year ago

Having same issue when deploying image from ACR to AKS cluster. The cluster has the access to pull image from ACR and worked well before.

I noticed that the Requirements mentions 'You should at most have only one Env Injector installment in your cluster, because multiple instances of the Env Injector mutating webhook might compete and fail when a new pod (that needs environment injection) is created'. But by following AKS install there would be 2 injectors. Could that be the cause?

tspearconquest commented 1 year ago

The service account for AKV2K8S needs to have permissions to pull the pod from ACR as documented here

This has all the markers of the issue describe in here - https://akv2k8s.io/installation/with-aad-pod-identity. But trying to fix it as described does not work:

This fix applies if you are running Azure AD Pod Identity in your cluster. If you don't run that, then the AzurePodIdentityException Custom Resource is not available and so this would not be the correct fix.

I noticed that the Requirements mentions 'You should at most have only one Env Injector installment in your cluster, because multiple instances of the Env Injector mutating webhook might compete and fail when a new pod (that needs environment injection) is created'. But by following AKS install there would be 2 injectors. Could that be the cause?

The controller is not an Env Injector - it handles syncing secrets stored in Keyvault into Kubernetes built-in Secrets resource.

Those of you with issues please try the steps mentioned in this comment and report back if you still experience issues.

tspearconquest commented 1 year ago

Apologies, the comment I linked above is also for if you are running Azure AD Pod Identity.

What needs the AcrPull permission is your Service Principal which you configured to your Keyvault.

tspearconquest commented 1 year ago

You can also try these suggestions:

  1. Try to add to your workload pod with the injected secret a command block so that the env-injector doesn't need to inspect the image (ref)
  2. Try to specify imagePullSecrets in the podTemplate spec (ref)
mracfa commented 1 year ago

On my environments on the latest release I also get UNAUTHORIZED even if the node identity already has AcrPull permission.

This works on the version 2.2.2 but not on 2.5.0. Not using AAD-Pod-Identity!

leedavidr commented 1 year ago

I was able to work around this issue by using imagePullSecrets when I first ran into this issue (no aad pod identity).

Recently, imagePullSecrets stopped working randomly in one of my clusters, and Azure reported an intermittent outage with CoreDNS shortly after. When the CoreDNS issue was reportedly resolved, I was still getting the same error. The error went away after restarting the cluster.

Adding to notes in case it helps anyone

MaryamTavakkoli commented 1 year ago

We are still having this issue with chart v2.5.0 and AKS 1.26.6. Error in the injector is:

[ERROR] admission webhook error: failed to get auto cmd, error: cannot fetch image descriptor: GET https://<ACR NAME>.azurecr.io/oauth2/token?scope=repository%3Aspvest%2Fakv2k8s-env-test%3Apull&service=relexplatformdev.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information.

We use system-assigned managed identity for AKS and no aad-po-identity or workload identity. imagePullSecrets is not an option for us.

If you have worked around the issue, would be really nice to share it with us, thanks.

mracfa commented 1 year ago

@leedavidr as far as i know using imagePullSecrets bypasses azure rbac and uses traditional authentication, that's why it works. On my setup using imagePullSecrets also works, but its not an option for us.

On this case the weird part is that i dont even see any requests coming to ACR, so the UNAUTHORIZED error is at aks level, even though it's mentioning ACR.

We are able to pull the same exact image from the same ACR without imagePullSecrets, as long as the workload doesn't use any env variables with secrets from Keyvault (secret@azurekeyvault).

mracfa commented 1 year ago

UPDATE: it was azure_policy add-on that was preventing us from creating the pods. The UNAUTHORIZED pull error from ACR was misleading.

johanncolarte commented 1 year ago

Hi @mracfa , would you mind sharing which azure policy is causing the UNAUTHORIZED error? I have disabled the security policy Container images should be deployed from trusted registries only but it didn't resolve the error.

mracfa commented 1 year ago

Not a specific policy but the addon itself.

how to repro: Create a new cluster with the addon enabled Push the sample injector image to an ACR Push an nginx to the same ACR Give acrpull to kubelet identity Install latest akv2k8s helm chart Create the KV secret and the akvs object deploy a simple nginx with the image you just pushed Deploy the sample injector app with the first image you just pushed

You should see a new nginx pod running; You should also see UNAUTHORIZED errors on kube events for the injector app and no pods running for this deployment;

Disable azure policy addon and restart cluster You should see both deployments with running pods and no errors;

Speeddymon commented 1 year ago

You should also see UNAUTHORIZED errors on kube events for the injector app and no pods running for this deployment;

What is the sample injector app you're using? Is it image: spvest/akv2k8s-env-test:2.0.1?

mracfa commented 1 year ago

You should also see UNAUTHORIZED errors on kube events for the injector app and no pods running for this deployment;

What is the sample injector app you're using? Is it image: spvest/akv2k8s-env-test:2.0.1?

Yes , exactly.

tspearconquest commented 1 year ago

Do you install the injector app in the same namespace with akv or a different namespace?

mracfa commented 1 year ago

Do you install the injector app in the same namespace with akv or a different namespace?

azurekeyvaultsecret object needs to be on the same namespace than the app. Akv2k8s stuff needs to be on a different namespace (one that does not have the label for the injector)

pctekki commented 1 year ago

I'm unable to turn off the Azure Policy add-on for my cluster due to the company's security policy. Once I downgrade the app version to 1.3.1 the env injector succeeds. 1.4.0 and later versions all fail to authenticate and throw the same issue as reported in the first post. Unfortunately, app version 1.3.1 has been flagged by our security scans so this solution does not work for us.

These are our values:

values:  
    global:  
      keyVaultAuth: azureCloudConfig  
    env_injector:  
      image:  
        tag: 1.3.1

We are able to work around the problem by defining the entrypoint command for the container.

Workaround is in the deployment file:

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: kuard
        command: ["/kuard"] # the command here is the entrypoint

Hopefully, this helps anyone encountering the same issue. This is unfortunately not an acceptable fix for my team as we manage 20+ applications and would need to specify the entrypoint for each and every application to make this work for us.

srmars commented 1 year ago

+1 Having same issue in 1.5.0

EnricoOr commented 1 year ago

Same issue here too, we are running latest version but with image version for the env-injector pinned at 1.3.1. Also our AKS cluster don't have at all Azure Policy add-on activated or oven installed. Like others we use system-assigned identity for AKS and no aad-po-identity or workload identity. imagePullSecrets is not an option for us to connect to the internal ACR. and neither is to specify the entrypoint for each container directly on the pod template, too big effort.

Any other idea about this problem?

abhilashjoseph commented 11 months ago

I have a PR (above mentioned) for this issue to use service principle credentials when azureCloudConfig is the authtype, please test and review and let me know if this is good to go?