brancz / kube-rbac-proxy

Kubernetes RBAC authorizing HTTP proxy for a single upstream.
Apache License 2.0
583 stars 188 forks source link

Add livenessProbe support for kube-rbac-proxy #244

Open jessehu opened 1 year ago

jessehu commented 1 year ago

When using kube-rbac-proxy v0.14.1, sometimes we found kube-rbac-proxy container stuck in TLS handshake error and can not recover automatically until it's restarted manually.

I0621 17:39:33.819787    1876 log.go:198] http: TLS handshake error from 10.255.9.20:55542: write tcp 10.255.9.20:9100->10.255.9.20:55542: write: broken pipe
I0621 17:40:40.388479    1876 log.go:198] http: TLS handshake error from 10.255.9.26:1531: write tcp 10.255.9.20:9100->10.255.9.26:1531: write: broken pipe
I0621 17:44:32.288256    1876 log.go:198] http: TLS handshake error from 10.255.9.26:30302: write tcp 10.255.9.20:9100->10.255.9.26:30302: write: broken pipe

When used with prometheus node-exporter, adding a livenessProbe can automatically restart kube-rbac-proxy container, e.g.:

        - image: myregistry.io/kube-rbac-proxy:v0.14.1-with-curl
          livenessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - "curl -sSL -ik -H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\" https://${IP}:9100/metrics | grep -e 'HTTP/2 200'"
            initialDelaySeconds: 30
            failureThreshold: 3
            periodSeconds: 30
            successThreshold: 1
            timeoutSeconds: 3
peterbueschel commented 8 months ago

Hi,

I would also vote for either such an option to probe the proxy (but that implies an extra binary in the image) or to maybe panic on such an error.

In our case we also receive:

webhook.go:154] Failed to make webhook authenticator request: Post ".....:443/apis/authentication.k8s.io/v1/tokenreviews": context deadline exceeded 

Afterwards only a manual restart helps to fix that...

Waji-97 commented 5 months ago

I have got the same issue using kube-rbac-proxy container with node-exporter in a single pod. To be honest, I am unsure why it occurs but whenever it does, it uses almost all of the CPU & Mem resources limited to the kube-rbac-proxy container and goes back to normal once the container is restarted. Is there any new news on this particular issue?