Open mo-saeed opened 2 years ago
AZURE_AUTHORITY_HOST
env var would be the same irrespective of Azure AD Apps/user-assigned managed identity. Please share the following details:
kubectl get pod <pod name> -o yaml
(redact client id)kubectl get serviceaccount <name> -o yaml
(redact client id)Thanks @aramase
latest version :v0.14.0
the yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"quick-start","namespace":"default"},"spec":{"containers":[{"env":[{"name":"KEYVAULT_NAME","value":"test-workload-identity"},{"name":"KEYVAULT_URL","value":"https://test-workload-identity.vault.azure.net/"},{"name":"SECRET_NAME","value":"my-secret"}],"image":"ghcr.io/azure/azure-workload-identity/msal-go","name":"oidc"}],"nodeSelector":{"kubernetes.io/os":"linux"},"securityContext":{"fsGroup":1001,"runAsUser":1001},"serviceAccountName":"workload-identity-sa"}}
name: quick-start
namespace: default
spec:
containers:
- env:
- name: KEYVAULT_NAME
value: test-workload-identity
- name: KEYVAULT_URL
value: https://test-workload-identity.vault.azure.net/
- name: SECRET_NAME
value: my-secret
image: ghcr.io/azure/azure-workload-identity/msal-go
imagePullPolicy: Always
name: oidc
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-tgjhd
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: ******* --> changed by me
nodeSelector:
kubernetes.io/os: linux
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1001
runAsUser: 1001
serviceAccount: workload-identity-sa
serviceAccountName: workload-identity-sa
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-tgjhd
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-10-26T13:27:37Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2022-10-26T16:27:31Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2022-10-26T16:27:31Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2022-10-26T13:27:37Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://b4d5f211bdd4f6373ef3d597d5b84c7756aa91e97e7f890ad3e8bc182e408830
image: ghcr.io/azure/azure-workload-identity/msal-go:latest
imageID: ghcr.io/azure/azure-workload-identity/msal-go@sha256:1e0bfde31b3dac25b8025a5d3d284e95d102af22033073ac045e9add45e03991
lastState:
terminated:
containerID: containerd://977b682d2afa99fe47e670ab6ceb24b16f99d0c0b89b512e1bef93efc926b79b
exitCode: 1
finishedAt: "2022-10-26T16:24:56Z"
reason: Error
startedAt: "2022-10-26T16:17:26Z"
name: oidc
ready: true
restartCount: 21
started: true
state:
running:
startedAt: "2022-10-26T16:27:31Z"
hostIP: 10.1.192.177
phase: Running
podIP: 10.1.193.118
podIPs:
- ip: 10.1.193.118
qosClass: BestEffort
startTime: "2022-10-26T13:27:37Z"
service account
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
azure.workload.identity/client-id: *****--> changed by me
azure.workload.identity/tenant-id: ***** --> changed by me
creationTimestamp: "2022-10-26T12:22:12Z"
labels:
azure.workload.identity/use: "true"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:azure.workload.identity/client-id: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:labels:
.: {}
f:azure.workload.identity/use: {}
manager: kubectl-client-side-apply
operation: Update
time: "2022-10-26T12:22:12Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:azure.workload.identity/tenant-id: {}
manager: kubectl-edit
operation: Update
time: "2022-10-26T13:27:09Z"
name: workload-identity-sa
namespace: default
resourceVersion: "444004502"
uid: 7d92c03e-381c-4091-946b-78ded9e0edbf
secrets:
name: workload-identity-sa-token-p7vjh
@mo-saeed I don't see the environment variables or projected service account token volume in the pod describe output. These would be injected in the pod by the mutating webhook. Can you share the output for kubectl get pods
to show the webhook is running and also the logs from the webhook pods?
The pod would have the following env vars as shown in kubectl describe pod quick-start
here: https://azure.github.io/azure-workload-identity/docs/quick-start.html#7-deploy-workload
the webhook pods are running but i don't see any relevant logs
{"level":"info","ts":1666785196.5036922,"logger":"entrypoint","msg":"initializing metrics backend","backend":"prometheus"}
{"level":"info","ts":1666785196.5037866,"logger":"entrypoint","msg":"setting up manager","userAgent":"azure-workload-identity/webhook/v0.14.0 (linux/amd64) 0198270/2022-10-20-21:15"}
I1026 11:53:17.604992 1 request.go:682] Waited for 1.013636896s due to client-side throttling, not priority and fairness, request: GET:https://10.7.0.1:443/apis/config.gatekeeper.sh/v1alpha1?timeout=32s
{"level":"info","ts":1666785198.36011,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8095"}
{"level":"info","ts":1666785198.36051,"logger":"entrypoint","msg":"setting up cert rotation"}
{"level":"info","ts":1666785198.3606827,"logger":"entrypoint","msg":"starting manager"}
{"level":"info","ts":1666785198.36088,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8095"}
{"level":"info","ts":1666785198.3609185,"msg":"Starting server","kind":"health probe","addr":"[::]:9440"}
{"level":"info","ts":1666785198.3610475,"msg":"Starting EventSource","controller":"cert-rotator","source":"&{{%!s(*v1.Secret=&{{ } { 0 {{0 0 <nil>}} <nil> <nil> map[] map[] [] [] []} <nil> map[] map[] }) %!s(*cache.informerCache=&{0xc00013e340}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"}
{"level":"info","ts":1666785198.3610656,"msg":"Starting EventSource","controller":"cert-rotator","source":"&{{%!s(*unstructured.Unstructured=&{map[apiVersion:admissionregistration.k8s.io/v1 kind:MutatingWebhookConfiguration]}) %!s(*cache.informerCache=&{0xc00013e340}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"}
{"level":"info","ts":1666785198.3610713,"msg":"Starting Controller","controller":"cert-rotator"}
{"level":"info","ts":1666785198.4615865,"logger":"cert-rotation","msg":"starting cert rotator controller"}
{"level":"info","ts":1666785204.8620014,"msg":"Starting workers","controller":"cert-rotator","worker count":1}
{"level":"info","ts":1666785204.8624527,"logger":"cert-rotation","msg":"no cert refresh needed"}
{"level":"info","ts":1666785204.862461,"logger":"cert-rotation","msg":"Ensuring CA cert","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration"}
{"level":"info","ts":1666785204.863037,"logger":"cert-rotation","msg":"certs are ready in /certs"}
{"level":"info","ts":1666785204.891872,"logger":"cert-rotation","msg":"Ensuring CA cert","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration"}
{"level":"info","ts":1666785206.4483023,"logger":"cert-rotation","msg":"CA certs are injected to webhooks"}
{"level":"info","ts":1666785206.4483643,"logger":"entrypoint","msg":"setting up webhook server"}
{"level":"info","ts":1666785206.4484563,"logger":"entrypoint","msg":"registering webhook to the webhook server"}
{"level":"info","ts":1666785206.448692,"logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"info","ts":1666785206.4487622,"logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"}
{"level":"info","ts":1666785206.4489477,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1666785206.4490101,"logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443}
{"level":"info","ts":1666785206.449091,"logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
does that mean the application can't reach the webhook ? but would i see any logs somewhere ?
does that mean the application can't reach the webhook ? but would i see any logs somewhere ?
there are 2 webhook pods. Could you share the logs from the other pod too? If an admission request is received by the webhook and it skips mutation, there'll be a log to indicate why it skipped. If there are no logs in the other pod either, then it's possible the request isn't coming to the webhook.
kubectl get mutatingwebhookconfiguration azure-wi-webhook-mutating-webhook-configuration -o yaml
(redact the cert) that's the logs of the other pod
{"level":"info","ts":1666785196.5036922,"logger":"entrypoint","msg":"initializing metrics backend","backend":"prometheus"}
{"level":"info","ts":1666785196.5037866,"logger":"entrypoint","msg":"setting up manager","userAgent":"azure-workload-identity/webhook/v0.14.0 (linux/amd64) 0198270/2022-10-20-21:15"}
I1026 11:53:17.604992 1 request.go:682] Waited for 1.013636896s due to client-side throttling, not priority and fairness, request: GET:https://10.7.0.1:443/apis/config.gatekeeper.sh/v1alpha1?timeout=32s
{"level":"info","ts":1666785198.36011,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8095"}
{"level":"info","ts":1666785198.36051,"logger":"entrypoint","msg":"setting up cert rotation"}
{"level":"info","ts":1666785198.3606827,"logger":"entrypoint","msg":"starting manager"}
{"level":"info","ts":1666785198.36088,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8095"}
{"level":"info","ts":1666785198.3609185,"msg":"Starting server","kind":"health probe","addr":"[::]:9440"}
{"level":"info","ts":1666785198.3610475,"msg":"Starting EventSource","controller":"cert-rotator","source":"&{{%!s(*v1.Secret=&{{ } { 0 {{0 0 <nil>}} <nil> <nil> map[] map[] [] [] []} <nil> map[] map[] }) %!s(*cache.informerCache=&{0xc00013e340}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"}
{"level":"info","ts":1666785198.3610656,"msg":"Starting EventSource","controller":"cert-rotator","source":"&{{%!s(*unstructured.Unstructured=&{map[apiVersion:admissionregistration.k8s.io/v1 kind:MutatingWebhookConfiguration]}) %!s(*cache.informerCache=&{0xc00013e340}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"}
{"level":"info","ts":1666785198.3610713,"msg":"Starting Controller","controller":"cert-rotator"}
{"level":"info","ts":1666785198.4615865,"logger":"cert-rotation","msg":"starting cert rotator controller"}
{"level":"info","ts":1666785204.8620014,"msg":"Starting workers","controller":"cert-rotator","worker count":1}
{"level":"info","ts":1666785204.8624527,"logger":"cert-rotation","msg":"no cert refresh needed"}
{"level":"info","ts":1666785204.862461,"logger":"cert-rotation","msg":"Ensuring CA cert","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration"}
{"level":"info","ts":1666785204.863037,"logger":"cert-rotation","msg":"certs are ready in /certs"}
{"level":"info","ts":1666785204.891872,"logger":"cert-rotation","msg":"Ensuring CA cert","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration"}
{"level":"info","ts":1666785206.4483023,"logger":"cert-rotation","msg":"CA certs are injected to webhooks"}
{"level":"info","ts":1666785206.4483643,"logger":"entrypoint","msg":"setting up webhook server"}
{"level":"info","ts":1666785206.4484563,"logger":"entrypoint","msg":"registering webhook to the webhook server"}
{"level":"info","ts":1666785206.448692,"logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"info","ts":1666785206.4487622,"logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"}
{"level":"info","ts":1666785206.4489477,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1666785206.4490101,"logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443}
{"level":"info","ts":1666785206.449091,"logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
kubectl get mutatingwebhookconfiguration azure-wi-webhook-mutating-webhook-configuration -o yaml
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
annotations:
meta.helm.sh/release-name: workload-identity-webhook
meta.helm.sh/release-namespace: kube-**
creationTimestamp: "2022-10-26T11:41:58Z"
generation: 3
labels:
app: workload-identity-webhook
app.kubernetes.io/managed-by: Helm
azure-workload-identity.io/system: "true"
chart: workload-identity-webhook
helm.toolkit.fluxcd.io/name: workload-identity-webhook
helm.toolkit.fluxcd.io/namespace: kube-**
release: workload-identity-webhook
managedFields:
- apiVersion: admissionregistration.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:webhooks:
k:{"name":"mutation.azure-workload-identity.io"}:
f:namespaceSelector: {}
manager: admissionsenforcer
operation: Update
time: "2022-10-26T11:41:58Z"
- apiVersion: admissionregistration.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app: {}
f:app.kubernetes.io/managed-by: {}
f:azure-workload-identity.io/system: {}
f:chart: {}
f:helm.toolkit.fluxcd.io/name: {}
f:helm.toolkit.fluxcd.io/namespace: {}
f:release: {}
f:webhooks:
.: {}
k:{"name":"mutation.azure-workload-identity.io"}:
.: {}
f:admissionReviewVersions: {}
f:clientConfig:
.: {}
f:service:
.: {}
f:name: {}
f:namespace: {}
f:path: {}
f:port: {}
f:failurePolicy: {}
f:matchPolicy: {}
f:name: {}
f:objectSelector: {}
f:reinvocationPolicy: {}
f:rules: {}
f:sideEffects: {}
f:timeoutSeconds: {}
manager: helm-controller
operation: Update
time: "2022-10-26T11:41:58Z"
- apiVersion: admissionregistration.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:webhooks:
k:{"name":"mutation.azure-workload-identity.io"}:
f:clientConfig:
f:caBundle: {}
manager: azure-workload-identity
operation: Update
time: "2022-10-26T11:42:13Z"
name: azure-wi-webhook-mutating-webhook-configuration
resourceVersion: "443746317"
uid: ae27b792-1ddc-4f01-9904-2315dc2f125e
webhooks:
- admissionReviewVersions:
- v1
- v1beta1
clientConfig:
caBundle: ***
service:
name: azure-wi-webhook-webhook-service
namespace: kube**
path: /mutate-v1-pod
port: 443
failurePolicy: Ignore
matchPolicy: Equivalent
name: mutation.azure-workload-identity.io
namespaceSelector:
matchExpressions:
- key: control-plane
operator: DoesNotExist
objectSelector: {}
reinvocationPolicy: Never
rules:
- apiGroups:
- ""
apiVersions:
- v1
operations:
- CREATE
- UPDATE
resources:
- pods
scope: '*'
sideEffects: None
timeoutSeconds: 10
it's possible the request isn't coming to the webhook.
would that show in any logs ? I doubt it might be that i need to create a network policy to allow the traffic to this webhook but i am not sure this is the issue.
@aramase the network policy was he issue as I expected. now after I created the nw policy it's working.
so 2 things here
Thanks
would it be possible to show some connection timeout to the webhook in any logs ? so we know what is the issue?
If the request doesn't reach the webhook, there is not much the webhook can surface here as it's unaware of the request. The timeout should be part of (kube-apiserver) KAS server logs. The failurePolicy: Ignore
would mean if the webhook isn't reachable the pod will still get deployed. Setting failurePolicy: Fail
will cause pods uses workload identity and any other pod to fail if the webhook isn't reachable but that's not recommended. (xref: https://open-policy-agent.github.io/gatekeeper/website/docs/failing-closed)
would it be possible to add a network policy as a default in the helm chart to allow communication to the application port from all namespaces ?
If you have a sample I can add this to our troubleshooting guide but I don't think we want to package these as part of our helm charts.
the first part, I understand.
the second part, can I ask why? I know many other helm charts with the possibility to create nw policy needed and control it via variable true or false. it can be added and still be default false.
Hi,
I tried to follow the same procedure as here https://azure.github.io/azure-workload-identity/docs/quick-start.html#5-create-a-kubernetes-service-account using user-assigned managed identity but in the pod log i see this error
Can you please advise what could be the issue here ?