Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 306 forks source link

Allow to disable loadbalancer health probes #1394

Open tarioch opened 4 years ago

tarioch commented 4 years ago

Currently there is no way to disable the loadbalancer health probes. It would be good if an annotation could be add to allow to disable the health probes. Either globally or for some of the ports.

jnoller commented 4 years ago

Can you provide a use case/business justification for completely disabling all health probes?

tarioch commented 4 years ago

Sure. Our use case is that we're exposing jenkins jnlp. The health probes flood on the one hand the log files with unnecessary log entries and the other more important factor seems to be that the health probes are "too aggressive" and disconnect the connection where it should still be ok and jenkins is without that able to keep the connection open.

Right now we did the workaround from here: https://stackoverflow.com/a/54257960 : Basically changing externalTrafficPolicy to Local and adding an explicit healthCheckNodePort.

Since we did that change the connection stays very stable where before it got interrupted every couple hours.

przemolb commented 4 years ago

I also would like to be able to disable all health probes. In our case it is log flooding (and our developers hate it analysing logs in case of issues ....). But also just to have an option. Not really sure what is the a use case/business justification to enforce health probes ?

pag08007 commented 4 years ago

We also ran into similar issue. We were deploying an application that was listening for TCP connections on a specific port and then triggered an event when a connection was made. The health probes were triggering our events and as a result were spamming our logs with fake errors.

alex-doerfler commented 4 years ago

For us this annotation would be helpful as well. We forward the TCP traffic to an outgoing connection. This is charged by bandwidth. The Healthprobes cause significant costs here. Therefore we had to use the workaround mentioned by @tarioch

przemolb commented 4 years ago

Any progress on this ?

github-actions[bot] commented 4 years ago

Action required from @Azure/aks-pm

Bessonov commented 4 years ago

Action required from @Azure/aks-pm

ghost commented 4 years ago

Action required from @Azure/aks-pm

TomGeske commented 4 years ago

+@palma21

antonmatsiuk commented 3 years ago

another use case is using bitnami helm chart for mysql. Health probes flood the log with Got an error reading communication packets messages

motmot80 commented 3 years ago

another use case is using the load balancer for udp services with no http or tcp endpoint.

FanerYedermann commented 3 years ago

another use case is using the load balancer for udp services with no http or tcp endpoint.

Almost the same case here. I have a raw socket that I don't want spammed.

dnovvak commented 3 years ago

Any update on this? Our use case is connection quality measurement using TCP/UDP sockets. Health probes from the load balancer disrupts measurements.

BobClaerhout commented 3 years ago

We are experiencing this issue as well. We have an mqtt port which is behind a loadbalancer. This mqtt port requires authentication and the health probe doesn't provide the authentication (of course) nor the correct protocol which results in logging (in the business application) of a faulty incoming request. Since this has been updated 4 days ago, is this active now? If yes, what would be the release time for this?

vishalsawale9 commented 3 years ago

I have a similar requirement. I'm hosting an HTTPS application on Azure AKS cluster with gunicorn as flask running wsgi gateway, I'm continuously getting these socket errors in pods, though the app is up and running. I suspect the health probes occupying some port, and thus getting those errors almost every 2-3 seconds.

TomasTokaMrazek commented 2 years ago

This is currently a severe blocker for our deployment. We have a service exposing non-traditional protocols like websockets and custom communication protocol over TCP. The health probe is sending some data instead of empty netcat, so every few seconds there is a exception and stack trace in our logs.

I understand that disabling health probe for ports is not a best practice, but it's fast solution to our issue discussed here. Other solution would be to allow us specify custom probe just as Kubernetes allows via readinessProbe and livenessProbe configuration.

I propose simple LB annotation service.beta.kubernetes.io/azure-load-balancer-disable-health-probe-for-port-names It's not ideal, but since we do not have health probes for UDP, it shouldn't matter that much.

Example

apiVersion: v1
kind: Service
metadata:
  name: app
  namespace : default
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
    service.beta.kubernetes.io/azure-load-balancer-internal-subnet: "SomeSubnet"
    service.beta.kubernetes.io/azure-load-balancer-disable-health-probe-for-port-names: "binary,binary-secure,jms-tcp"
spec:
  selector:
    app: app
  type: LoadBalancer
  ports:
  - name: servlet-http
    protocol: TCP
    port: 9763
    targetPort: 9763
  - name: servlet-https
    protocol: TCP
    port: 9443
    targetPort: 9443
  - name: binary
    protocol: TCP
    port: 9611
    targetPort: 9611
  - name: binary-secure
    protocol: TCP
    port: 9711
    targetPort: 9711
  - name: jms-tcp
    protocol: TCP
    port: 5672
    targetPort: 5672

I dug up some other annotations realted to health probes here, but that doesn't seem to work or I don't understand, what it does.

service.beta.kubernetes.io/azure-load-balancer-health-probe-protocol service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path

mplachter commented 2 years ago

Would love to have a way to disable load balancer health checks for specific ports, use case for example is GRPC Ports don't like when TCP things probe them and do not ask for the GRPC Preface. This causes a ton of Log flooding which is just noise.

Another valid option would be instead to allow the specific configuration to have a different healthcheck like a http healthcheck to the downstream service instead of checking the GRPC TCP port for availability.

Wuzyn commented 2 years ago

Hi all I'm having the same issue. I'm hosting sftp server on AKS.

hterik commented 2 years ago

Putting LoadBalancer in front of a HTTP server as many have done above you need to be aware of following. The LoadBalancer health probe runs from each node in the cluster. It opens a TCP request, holds it open, sends nothing, and then waits for 15 seconds before closing the connection. I don't know if its the responsibility of the server or prober to close it faster, but most servers i seen it just occupies one thread. Meaning your server must concurrently be able to hold at least one connections open per node. Switching to some kind of asyncio server helps a lot, otherwise you need to increase the number of threads to match at least the number of nodes in your cluster.

A better solution is to consider a Ingress controller when dealing with HTTP.

solacens commented 1 year ago

For my infrastructure it requires multiple rules across different port, so if I need to have multiple copies of my microservices, I need multiple copies of ingress-controller for TCP forwarding. As a result, I turned into the Azure CNI provided LoadBalancer type.

After that somehow the client was experiencing intermittent 502 BAD GATEWAY for requests and I highly doubt it is related to the kubernetes or kubernetes-internal load balancer health probe misdetection underneath the Azure CNI. And I would like to rule out this possibility by disabling that.

fethullahmisir commented 1 year ago

I had the same problem and was able to disable the health probe for my sftp server port with this annotation:

From the docs: https://cloud-provider-azure.sigs.k8s.io/topics/loadbalancer/#loadbalancer-annotations

service.beta.kubernetes.io/port_{port}_no_probe_rule: true

Where{port}must be replaced by the service port like service.beta.kubernetes.io/port_22_no_probe_rule.

I think this issue can be closed as disabling health probes are already supported.

TomasTokaMrazek commented 1 year ago

This was always the possibility or it was recently added as new function to AKS LB?

fethullahmisir commented 1 year ago

The doc states that it's possible since AKS Version v1.24. I don't know when the v1.24 Version was released.