Azure / azure-workload-identity

Azure AD Workload Identity uses Kubernetes primitives to associate managed identities for Azure resources and identities in Azure Active Directory (AAD) with pods.
https://azure.github.io/azure-workload-identity
MIT License
298 stars 95 forks source link

failed to create confidential client err=the Authority does not appear to use https #611

Open mo-saeed opened 2 years ago

mo-saeed commented 2 years ago

Hi,

I tried to follow the same procedure as here https://azure.github.io/azure-workload-identity/docs/quick-start.html#5-create-a-kubernetes-service-account using user-assigned managed identity but in the pod log i see this error

E1026 12:41:17.836791       1 token_credential.go:49] "failed to create confidential client" err="the Authority(TENANT_ID/oauth2/token) does not appear to use https"

Can you please advise what could be the issue here ?

aramase commented 2 years ago

AZURE_AUTHORITY_HOST env var would be the same irrespective of Azure AD Apps/user-assigned managed identity. Please share the following details:

  1. What version of the webhook are you using?
  2. Final pod yaml (kubectl get pod <pod name> -o yaml (redact client id)
  3. Final service account yaml (kubectl get serviceaccount <name> -o yaml (redact client id)
mo-saeed commented 2 years ago

Thanks @aramase

aramase commented 2 years ago

@mo-saeed I don't see the environment variables or projected service account token volume in the pod describe output. These would be injected in the pod by the mutating webhook. Can you share the output for kubectl get pods to show the webhook is running and also the logs from the webhook pods?

The pod would have the following env vars as shown in kubectl describe pod quick-start here: https://azure.github.io/azure-workload-identity/docs/quick-start.html#7-deploy-workload

mo-saeed commented 2 years ago

the webhook pods are running but i don't see any relevant logs

{"level":"info","ts":1666785196.5036922,"logger":"entrypoint","msg":"initializing metrics backend","backend":"prometheus"}
{"level":"info","ts":1666785196.5037866,"logger":"entrypoint","msg":"setting up manager","userAgent":"azure-workload-identity/webhook/v0.14.0 (linux/amd64) 0198270/2022-10-20-21:15"}
I1026 11:53:17.604992       1 request.go:682] Waited for 1.013636896s due to client-side throttling, not priority and fairness, request: GET:https://10.7.0.1:443/apis/config.gatekeeper.sh/v1alpha1?timeout=32s
{"level":"info","ts":1666785198.36011,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8095"}
{"level":"info","ts":1666785198.36051,"logger":"entrypoint","msg":"setting up cert rotation"}
{"level":"info","ts":1666785198.3606827,"logger":"entrypoint","msg":"starting manager"}
{"level":"info","ts":1666785198.36088,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8095"}
{"level":"info","ts":1666785198.3609185,"msg":"Starting server","kind":"health probe","addr":"[::]:9440"}
{"level":"info","ts":1666785198.3610475,"msg":"Starting EventSource","controller":"cert-rotator","source":"&{{%!s(*v1.Secret=&{{ } {      0 {{0 0 <nil>}} <nil> <nil> map[] map[] [] [] []} <nil> map[] map[] }) %!s(*cache.informerCache=&{0xc00013e340}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"}
{"level":"info","ts":1666785198.3610656,"msg":"Starting EventSource","controller":"cert-rotator","source":"&{{%!s(*unstructured.Unstructured=&{map[apiVersion:admissionregistration.k8s.io/v1 kind:MutatingWebhookConfiguration]}) %!s(*cache.informerCache=&{0xc00013e340}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"}
{"level":"info","ts":1666785198.3610713,"msg":"Starting Controller","controller":"cert-rotator"}
{"level":"info","ts":1666785198.4615865,"logger":"cert-rotation","msg":"starting cert rotator controller"}
{"level":"info","ts":1666785204.8620014,"msg":"Starting workers","controller":"cert-rotator","worker count":1}
{"level":"info","ts":1666785204.8624527,"logger":"cert-rotation","msg":"no cert refresh needed"}
{"level":"info","ts":1666785204.862461,"logger":"cert-rotation","msg":"Ensuring CA cert","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration"}
{"level":"info","ts":1666785204.863037,"logger":"cert-rotation","msg":"certs are ready in /certs"}
{"level":"info","ts":1666785204.891872,"logger":"cert-rotation","msg":"Ensuring CA cert","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration"}
{"level":"info","ts":1666785206.4483023,"logger":"cert-rotation","msg":"CA certs are injected to webhooks"}
{"level":"info","ts":1666785206.4483643,"logger":"entrypoint","msg":"setting up webhook server"}
{"level":"info","ts":1666785206.4484563,"logger":"entrypoint","msg":"registering webhook to the webhook server"}
{"level":"info","ts":1666785206.448692,"logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"info","ts":1666785206.4487622,"logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"}
{"level":"info","ts":1666785206.4489477,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1666785206.4490101,"logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443}
{"level":"info","ts":1666785206.449091,"logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}

does that mean the application can't reach the webhook ? but would i see any logs somewhere ?

aramase commented 2 years ago

does that mean the application can't reach the webhook ? but would i see any logs somewhere ?

there are 2 webhook pods. Could you share the logs from the other pod too? If an admission request is received by the webhook and it skips mutation, there'll be a log to indicate why it skipped. If there are no logs in the other pod either, then it's possible the request isn't coming to the webhook.

  1. Logs from other webhook pod
  2. kubectl get mutatingwebhookconfiguration azure-wi-webhook-mutating-webhook-configuration -o yaml (redact the cert)
mo-saeed commented 2 years ago

that's the logs of the other pod

{"level":"info","ts":1666785196.5036922,"logger":"entrypoint","msg":"initializing metrics backend","backend":"prometheus"}
{"level":"info","ts":1666785196.5037866,"logger":"entrypoint","msg":"setting up manager","userAgent":"azure-workload-identity/webhook/v0.14.0 (linux/amd64) 0198270/2022-10-20-21:15"}
I1026 11:53:17.604992       1 request.go:682] Waited for 1.013636896s due to client-side throttling, not priority and fairness, request: GET:https://10.7.0.1:443/apis/config.gatekeeper.sh/v1alpha1?timeout=32s
{"level":"info","ts":1666785198.36011,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8095"}
{"level":"info","ts":1666785198.36051,"logger":"entrypoint","msg":"setting up cert rotation"}
{"level":"info","ts":1666785198.3606827,"logger":"entrypoint","msg":"starting manager"}
{"level":"info","ts":1666785198.36088,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8095"}
{"level":"info","ts":1666785198.3609185,"msg":"Starting server","kind":"health probe","addr":"[::]:9440"}
{"level":"info","ts":1666785198.3610475,"msg":"Starting EventSource","controller":"cert-rotator","source":"&{{%!s(*v1.Secret=&{{ } {      0 {{0 0 <nil>}} <nil> <nil> map[] map[] [] [] []} <nil> map[] map[] }) %!s(*cache.informerCache=&{0xc00013e340}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"}
{"level":"info","ts":1666785198.3610656,"msg":"Starting EventSource","controller":"cert-rotator","source":"&{{%!s(*unstructured.Unstructured=&{map[apiVersion:admissionregistration.k8s.io/v1 kind:MutatingWebhookConfiguration]}) %!s(*cache.informerCache=&{0xc00013e340}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"}
{"level":"info","ts":1666785198.3610713,"msg":"Starting Controller","controller":"cert-rotator"}
{"level":"info","ts":1666785198.4615865,"logger":"cert-rotation","msg":"starting cert rotator controller"}
{"level":"info","ts":1666785204.8620014,"msg":"Starting workers","controller":"cert-rotator","worker count":1}
{"level":"info","ts":1666785204.8624527,"logger":"cert-rotation","msg":"no cert refresh needed"}
{"level":"info","ts":1666785204.862461,"logger":"cert-rotation","msg":"Ensuring CA cert","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration"}
{"level":"info","ts":1666785204.863037,"logger":"cert-rotation","msg":"certs are ready in /certs"}
{"level":"info","ts":1666785204.891872,"logger":"cert-rotation","msg":"Ensuring CA cert","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration"}
{"level":"info","ts":1666785206.4483023,"logger":"cert-rotation","msg":"CA certs are injected to webhooks"}
{"level":"info","ts":1666785206.4483643,"logger":"entrypoint","msg":"setting up webhook server"}
{"level":"info","ts":1666785206.4484563,"logger":"entrypoint","msg":"registering webhook to the webhook server"}
{"level":"info","ts":1666785206.448692,"logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"info","ts":1666785206.4487622,"logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"}
{"level":"info","ts":1666785206.4489477,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1666785206.4490101,"logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443}
{"level":"info","ts":1666785206.449091,"logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
 kubectl get mutatingwebhookconfiguration azure-wi-webhook-mutating-webhook-configuration -o yaml
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  annotations:
    meta.helm.sh/release-name: workload-identity-webhook
    meta.helm.sh/release-namespace: kube-**
  creationTimestamp: "2022-10-26T11:41:58Z"
  generation: 3
  labels:
    app: workload-identity-webhook
    app.kubernetes.io/managed-by: Helm
    azure-workload-identity.io/system: "true"
    chart: workload-identity-webhook
    helm.toolkit.fluxcd.io/name: workload-identity-webhook
    helm.toolkit.fluxcd.io/namespace: kube-**
    release: workload-identity-webhook
  managedFields:
  - apiVersion: admissionregistration.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:webhooks:
        k:{"name":"mutation.azure-workload-identity.io"}:
          f:namespaceSelector: {}
    manager: admissionsenforcer
    operation: Update
    time: "2022-10-26T11:41:58Z"
  - apiVersion: admissionregistration.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:meta.helm.sh/release-name: {}
          f:meta.helm.sh/release-namespace: {}
        f:labels:
          .: {}
          f:app: {}
          f:app.kubernetes.io/managed-by: {}
          f:azure-workload-identity.io/system: {}
          f:chart: {}
          f:helm.toolkit.fluxcd.io/name: {}
          f:helm.toolkit.fluxcd.io/namespace: {}
          f:release: {}
      f:webhooks:
        .: {}
        k:{"name":"mutation.azure-workload-identity.io"}:
          .: {}
          f:admissionReviewVersions: {}
          f:clientConfig:
            .: {}
            f:service:
              .: {}
              f:name: {}
              f:namespace: {}
              f:path: {}
              f:port: {}
          f:failurePolicy: {}
          f:matchPolicy: {}
          f:name: {}
          f:objectSelector: {}
          f:reinvocationPolicy: {}
          f:rules: {}
          f:sideEffects: {}
          f:timeoutSeconds: {}
    manager: helm-controller
    operation: Update
    time: "2022-10-26T11:41:58Z"
  - apiVersion: admissionregistration.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:webhooks:
        k:{"name":"mutation.azure-workload-identity.io"}:
          f:clientConfig:
            f:caBundle: {}
    manager: azure-workload-identity
    operation: Update
    time: "2022-10-26T11:42:13Z"
  name: azure-wi-webhook-mutating-webhook-configuration
  resourceVersion: "443746317"
  uid: ae27b792-1ddc-4f01-9904-2315dc2f125e
webhooks:
- admissionReviewVersions:
  - v1
  - v1beta1
  clientConfig:
    caBundle: ***
    service:
      name: azure-wi-webhook-webhook-service
      namespace: kube**
      path: /mutate-v1-pod
      port: 443
  failurePolicy: Ignore
  matchPolicy: Equivalent
  name: mutation.azure-workload-identity.io
  namespaceSelector:
    matchExpressions:
    - key: control-plane
      operator: DoesNotExist
  objectSelector: {}
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - v1
    operations:
    - CREATE
    - UPDATE
    resources:
    - pods
    scope: '*'
  sideEffects: None
  timeoutSeconds: 10

it's possible the request isn't coming to the webhook. would that show in any logs ? I doubt it might be that i need to create a network policy to allow the traffic to this webhook but i am not sure this is the issue.

mo-saeed commented 2 years ago

@aramase the network policy was he issue as I expected. now after I created the nw policy it's working.

so 2 things here

Thanks

aramase commented 2 years ago

would it be possible to show some connection timeout to the webhook in any logs ? so we know what is the issue?

If the request doesn't reach the webhook, there is not much the webhook can surface here as it's unaware of the request. The timeout should be part of (kube-apiserver) KAS server logs. The failurePolicy: Ignore would mean if the webhook isn't reachable the pod will still get deployed. Setting failurePolicy: Fail will cause pods uses workload identity and any other pod to fail if the webhook isn't reachable but that's not recommended. (xref: https://open-policy-agent.github.io/gatekeeper/website/docs/failing-closed)

would it be possible to add a network policy as a default in the helm chart to allow communication to the application port from all namespaces ?

If you have a sample I can add this to our troubleshooting guide but I don't think we want to package these as part of our helm charts.

mo-saeed commented 2 years ago

the first part, I understand.

the second part, can I ask why? I know many other helm charts with the possibility to create nw policy needed and control it via variable true or false. it can be added and still be default false.