OpenUnison / openunison-k8s

Access portal for Kubernetes
Apache License 2.0
102 stars 5 forks source link

[BUG] kube-oidc-proxy-orchestra blocks the port-forward traffic for all users #86

Closed droslean closed 1 year ago

droslean commented 1 year ago

There is an ingress with the network.api_server_host that points to the kube-oidc-proxy-orchestra.

Openunison kubeconfig is pointing to the network.api_server_host for cluster address.

Any kubeconfig that is created by Openunison, can't be used to port-forward any pod.

Error from kubectl is: error: lost connection to pod

Error in the kube-oidc-proxy-orchestra logs:

E0902 11:49:18.915791 1 handlers.go:271] unknown error (10.244.1.56:47794): context canceled

Version: docker.io/tremolosecurity/kube-oidc-proxy:latest

mlbiam commented 1 year ago

hm, can you provide:

  1. What version of Kubernetes
  2. What kind of ingress controller
  3. What kind of load balancer
  4. what environment are you running in? (ie on-prem, a cloud, etc)
  5. what are you trying to port forward?

i'll run a quick test to make sure everything is working to see if it's an issue in kube-oidc-proxy or if it's environmental. Are you able to use kubectl exec to open a shell into a container? One thing to be aware of is that port-forward (and exec/cp) all use the SPDY protocol and many infrastructures just don't support it anymore.

mlbiam commented 1 year ago

Just validated that port-forward is working, so the issue is either with the environment or the service you're trying to forward.

droslean commented 1 year ago

@mlbiam

What version of Kubernetes

Kubernetes version: Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.12", GitCommit:"ba490f01df1945d0567348b271c79a2aece7f623", GitTreeState:"clean", BuildDate:"2023-07-19T12:17:23Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}

What kind of ingress controller

Ingress controller: k8s.gcr.io/ingress-nginx/controller:v1.2.0@sha256:d8196e3bc1e72547c5dec66d6556c0ff92a23f6d0919b206be170bc90d5f9185

Load balancer: Is managed by digital ocean

what environment are you running in? (ie on-prem, a cloud, etc)

Cloud

what are you trying to port forward?

Pods, services..... Nothing is working with the kubeconfig Openunison is generating.

droslean commented 1 year ago

Just validated that port-forward is working, so the issue is either with the environment or the service you're trying to forward.

The port-forward is working using the admin kubeconfig which is using a different cluster address. All kubeconfigs that are generated by Openunison which are using the cluster address that point to the kube-oidc-proxy-orchestra, are not working.

mlbiam commented 1 year ago

Can you kubectl exec into a pod?

droslean commented 1 year ago

Can you kubectl exec into a pod?

@mlbiam I can't exec to any of the pods. kubectl just hangs

mlbiam commented 1 year ago

@mlbiam I can't exec to any of the pods. kubectl just hangs

So something is blocking SPDY. Is your digital ocean load balancer a tcp balancer or https?

droslean commented 1 year ago

@mlbiam This is just stopped working. The port-forward used to work just fine. I didn't update/upgrade Openunison. I am guessing that after a minor k8s update the issue appeared. The load balancer is managed by digital ocean and there was no changes happened there either.

Is it possible that any SSL expired? If that was the case, at least we would get an error. Currently there are no error logs anywhere.

mlbiam commented 1 year ago

load balancer is managed by digital ocean and there was no changes happened there either

Is the load balancer hosting the certificate or is nginx?

1.25 seems old. But it doesn't sound right that it would be a bug in k8s but it's a possibility.

I doubt it's an issue with kube-oidc-proxy, but try ghcr.io/tremolosecurity/kube-oidc-proxy:1.0.5 just to make sure we are on the same page.

droslean commented 1 year ago

@mlbiam nginx is hosting the certificate.

Same issue with ghcr.io/tremolosecurity/kube-oidc-proxy:1.0.5

mlbiam commented 1 year ago

@droslean i just deployed on DigitalOcean with NGINX and both port forwarding and exec worked fine. Used Kubernetes 1.25. I would open a ticket with DigitalOcean, I'm not seeing anything that's a bug in kube-oidc-proxy.

droslean commented 1 year ago

@mlbiam I opened a ticket to DigitalOcean and they can't find any issue. The port-forwarding works if I use digital ocean's kubeconfig. The kubeconfigs with oidc configured inside by Openunison are all broken. I will try to deploy Dex instead and try again.

mlbiam commented 1 year ago

The port-forwarding works if I use digital ocean's kubeconfig.

that makes sense. Cloud providers have different networking infrastructure for their API servers vs nodes. I'm pretty sure the root problem is that the SPDY protocol is being blocked somewhere between kubectl and kube-oidc-proxy. SPDY has been dropped by most modern network infrastructure and languages because it was replaced by HTTP/2 in 2015. Kubernetes continues to rely on it (well, Kubernetes supports both WebSockets and SPDY because only go still supports SPDY). The client-go sdk for Kubernetes still relies on SPDY (https://github.com/kubernetes/kubernetes/issues/89163) which is used by kubectl (and most local desktop clients like k9s)

When you use OpenUnison+kube-oidc-proxy you're instead relying on your node's networking infrastructure. That's why the static config you download from DigitalOcean works. It's network infrastructure is specifically designed to support SPDY because of Kubernetes' insistence on using it.

I will try to deploy Dex instead and try again.

Your problem isn't with OpenUnison, it's with kube-oidc-proxy. Replacing with Dex won't help you because:

  1. DigitalOcean doesn't support OpenID Connect integration. Like most cloud providers they give you a pre-defined configuration and you're not allowed to make changes to the API server flags needed to directly support OIDC. See https://github.com/PacktPublishing/Kubernetes---An-Enterprise-Guide-2E/blob/main/chapter5/B17950_Chapter_05.pdf for a detailed explanation. So if you deploy Dex, you'll still need to deploy something like kube-oidc-proxy to support auth to your cluster.
  2. Your problem is with kube-oidc-proxy+network infrastructure, not authentication. Swapping out the identity provider won't fix that.
droslean commented 1 year ago

@mlbiam I get your points for 1 and 2 and agree, but I do not see any solution here. There are 0 logs anywhere and Digital ocean can't find any problem either. I will try first to upgrade the cluster and get back to this issue.

droslean commented 1 year ago

Same problem with Kubernetes 1.27.4-do.0 version.

droslean commented 1 year ago

@mlbiam Any ideas here? Are there any debug logs in kube-oidc-proxy that I can enable to identify the issue?

droslean commented 1 year ago

We get an AuFail with error:

E0922 10:10:37.396191       1 handlers.go:271] unknown error (10.XXX.X.XXX:39782): context canceled
droslean commented 1 year ago

I found the issue.... It seems that the SSL certs were deleted....... I tried to recreate them following https://github.com/TremoloSecurity/OpenUnison/wiki/troubleshooting#how-do-i-change-openunisons-certificates