OpenUnison / openunison-k8s

Access portal for Kubernetes
Apache License 2.0
99 stars 5 forks source link

Problem with webhook #41

Open alenhodzic85 opened 2 years ago

alenhodzic85 commented 2 years ago

Hi, Everytime I want to update setup, I get error with webhook and need to reinstall the whole setup.

For example, here I changed just session_inactivity_timeout_seconds

Error: cannot patch "azuread-load-groups" with kind AuthenticationChain: Internal error occurred: failed calling webhook "authchains-openunison.tremolo.io": Post "https://openunison-orchestra.openunison.svc:443/k8s/webhooks/v1/authchains?timeout=5s": context deadline exceeded
│ 
│   with module.infra-services.helm_release.openunison-orchestra-login-azuread,
│   on infra-services/openunison.tf line 79, in resource "helm_release" "openunison-orchestra-login-azuread":
│   79: resource "helm_release" "openunison-orchestra-login-azuread" {

I don't see any unusual logs...

mlbiam commented 2 years ago

That is really odd. There's just not much going on there, just making sure the config is valid. If you edit the validating webhook: k edit ValidatingWebhookConfiguration openunison-workflow-validation-orchestra and change all the timeoutSeconds: 5 --> timeoutSeconds: 30 does the problem keep happening?

alenhodzic85 commented 2 years ago

Still failing:

Error: cannot patch "azuread-load-groups" with kind AuthenticationChain: Internal error occurred: failed calling webhook "authchains-openunison.tremolo.io": Post "https://openunison-orchestra.openunison.svc:443/k8s/webhooks/v1/authchains?timeout=30s": context deadline exceeded
│ 
│   with module.infra-services.helm_release.openunison-orchestra-login-azuread,
│   on infra-services/openunison.tf line 79, in resource "helm_release" "openunison-orchestra-login-azuread":
│   79: resource "helm_release" "openunison-orchestra-login-azuread" {
mlbiam commented 2 years ago

Odd. In the openunison-orchestra logs, do you see /k8s/webhooks/v1/authchains?timeout=30s in the logs? Also, howany replicas for the openunison-orchestra pod? Can you try increasing it?

mlbiam commented 2 years ago

Also, any network policies in the openunison namespace?

alenhodzic85 commented 2 years ago

Log from openunison-orchestra:

2022-04-29T15:27:50+02:00 [2022-04-29 13:27:50,146][Thread-22] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x6cc8b7ac-d064-4290-bfe8-3194cc80ea03x' - 404 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"oidc-sessions.openunison.tremolo.io \"x6cc8b7ac-d064-4290-bfe8-3194cc80ea03x\" not found","reason":"NotFound","details":{"name":"x6cc8b7ac-d064-4290-bfe8-3194cc80ea03x","group":"openunison.tremolo.io","kind":"oidc-sessions"},"code":404}
mlbiam commented 2 years ago

2022-04-29T15:27:50+02:00 [2022-04-29 13:27:50,146][Thread-22] WARN OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x6cc8b7ac-d064-4290-bfe8-3194cc80ea03x' - 404 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"oidc-sessions.openunison.tremolo.io \"x6cc8b7ac-d064-4290-bfe8-3194cc80ea03x\" not found","reason":"NotFound","details":{"name":"x6cc8b7ac-d064-4290-bfe8-3194cc80ea03x","group":"openunison.tremolo.io","kind":"oidc-sessions"},"code":404}

this is a red herring. its openunison looking to cleanup sessions. Are there network policies in the openunison namespace?

alenhodzic85 commented 2 years ago
kubectl get networkpolicies -n openunison
NAME                            POD-SELECTOR                       AGE
allow-from-apiserver            application=openunison-orchestra   21h
allow-from-ingress              application=openunison-orchestra   21h
allow-from-prometheus           application=openunison-orchestra   21h
default-deny-ingress            <none>                             21h
oidc-proxy-allow-from-ingress   app=kube-oidc-proxy-orchestra      21h
openunison-to-activemq          app=amq-orchestra                  21h
mlbiam commented 2 years ago

if you disable the networkpolicies in the helm chart, do you get the same issue?

alenhodzic85 commented 2 years ago

I added this and it is still failing. And the networkpolicies are still there. Can I delete them manually?

network_policies:
  enabled: false

And it looks like a bit different error in terraform:

Error: cannot patch "azuread-load-groups" with kind AuthenticationChain: Internal error occurred: failed calling webhook "authchains-openunison.tremolo.io": Post "https://openunison-orchestra.openunison.svc:443/k8s/webhooks/v1/authchains?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
mlbiam commented 2 years ago

Can I delete them manually?

sure, go ahead and just delete them. they can get restored later

alenhodzic85 commented 2 years ago

Still failing

mlbiam commented 2 years ago

same error?

alenhodzic85 commented 2 years ago

Error: cannot patch "azuread-load-groups" with kind AuthenticationChain: Internal error occurred: failed calling webhook "authchains-openunison.tremolo.io": Post "https://openunison-orchestra.openunison.svc:443/k8s/webhooks/v1/authchains?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

mlbiam commented 2 years ago

What version of k8s, which cni, how did you deploy? (Ie kubeadm)

alenhodzic85 commented 2 years ago

What version of k8s

K8s AWS EKS: K8s Rev: v1.21.9-eks-0d102a7

which cni

Default EKS CNI: Amazon VPC Container network interface (CNI) plugi

how did you deploy

Deployed using EKS terraform module

mlbiam commented 2 years ago

thanks, i'll work to reproduce

mlbiam commented 2 years ago

did you enable the network policies in the values.yaml in your initial deployment, or did you enable it afterwards?

alenhodzic85 commented 2 years ago

I guess they were enabled by default: https://github.com/OpenUnison/helm-charts/blob/f2b2ba7cf91c402591e1f88e563363a28bcd389e/orchestra/values.yaml#L81

mlbiam commented 2 years ago

i can't find a way to reproduce this issue. when you make your update, are you updating just the orchestra-login-azuread chart? While you're waiting for the chart to timeout, can you login to openunison?

alenhodzic85 commented 2 years ago

Sorry for late reply, I was on vacation. Like I mentioned I am just changing session_inactivity_timeout_seconds in value file and triggers update for all affected charts.

While you're waiting for the chart to timeout, can you login to openunison?

Yes, nothing is redeployed or stopped.

alenhodzic85 commented 2 years ago

Hi, any update on this?

mlbiam commented 2 years ago

unfortunately i've not been able to reproduce. The issue appears to be localized to your cluster and I don't know why. It looks like OpenUnison is taking requests. are there any other webhooks used in the cluster? Do they have any issues?

We're going to be rolling out a new kubectl plugin that automates the rollout of openunison so you don't need to run the helm charts individually. It'll account for potential timing issues.

alenhodzic85 commented 2 years ago

Yes we have other webhooks like from vault. Is the new kubectl plugin already released?

mlbiam commented 2 years ago

What's so odd about this is the API server is able to talk to the webhook on initial install, but not aftewards. There has to be some circumstance that's causing it. When the timeout happens, are there any API server logs? Somehting that indicates a timeout or that DNS didn't resolve?

Is the new kubectl plugin already released?

https://www.tremolosecurity.com/post/simplify-kubernetes-authentication-for-single-clusters-and-multi-cluster-environments

I think we're going to add a flag for additional charts to be run (like the azuread one) to make it simpler. Its a common use case.