Open tarioch opened 4 years ago
Can you provide a use case/business justification for completely disabling all health probes?
Sure. Our use case is that we're exposing jenkins jnlp. The health probes flood on the one hand the log files with unnecessary log entries and the other more important factor seems to be that the health probes are "too aggressive" and disconnect the connection where it should still be ok and jenkins is without that able to keep the connection open.
Right now we did the workaround from here: https://stackoverflow.com/a/54257960 : Basically changing externalTrafficPolicy to Local and adding an explicit healthCheckNodePort.
Since we did that change the connection stays very stable where before it got interrupted every couple hours.
I also would like to be able to disable all health probes. In our case it is log flooding (and our developers hate it analysing logs in case of issues ....). But also just to have an option. Not really sure what is the a use case/business justification to enforce health probes ?
We also ran into similar issue. We were deploying an application that was listening for TCP connections on a specific port and then triggered an event when a connection was made. The health probes were triggering our events and as a result were spamming our logs with fake errors.
For us this annotation would be helpful as well. We forward the TCP traffic to an outgoing connection. This is charged by bandwidth. The Healthprobes cause significant costs here. Therefore we had to use the workaround mentioned by @tarioch
Any progress on this ?
Action required from @Azure/aks-pm
Action required from @Azure/aks-pm
Action required from @Azure/aks-pm
+@palma21
another use case is using bitnami helm chart for mysql. Health probes flood the log with Got an error reading communication packets
messages
another use case is using the load balancer for udp services with no http or tcp endpoint.
another use case is using the load balancer for udp services with no http or tcp endpoint.
Almost the same case here. I have a raw socket that I don't want spammed.
Any update on this? Our use case is connection quality measurement using TCP/UDP sockets. Health probes from the load balancer disrupts measurements.
We are experiencing this issue as well. We have an mqtt port which is behind a loadbalancer. This mqtt port requires authentication and the health probe doesn't provide the authentication (of course) nor the correct protocol which results in logging (in the business application) of a faulty incoming request. Since this has been updated 4 days ago, is this active now? If yes, what would be the release time for this?
I have a similar requirement. I'm hosting an HTTPS
application on Azure AKS cluster with gunicorn
as flask running wsgi gateway, I'm continuously getting these socket errors in pods, though the app is up and running. I suspect the health probes occupying some port, and thus getting those errors almost every 2-3 seconds.
This is currently a severe blocker for our deployment. We have a service exposing non-traditional protocols like websockets and custom communication protocol over TCP. The health probe is sending some data instead of empty netcat, so every few seconds there is a exception and stack trace in our logs.
I understand that disabling health probe for ports is not a best practice, but it's fast solution to our issue discussed here. Other solution would be to allow us specify custom probe just as Kubernetes allows via readinessProbe and livenessProbe configuration.
I propose simple LB annotation service.beta.kubernetes.io/azure-load-balancer-disable-health-probe-for-port-names
It's not ideal, but since we do not have health probes for UDP, it shouldn't matter that much.
Example
apiVersion: v1
kind: Service
metadata:
name: app
namespace : default
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
service.beta.kubernetes.io/azure-load-balancer-internal-subnet: "SomeSubnet"
service.beta.kubernetes.io/azure-load-balancer-disable-health-probe-for-port-names: "binary,binary-secure,jms-tcp"
spec:
selector:
app: app
type: LoadBalancer
ports:
- name: servlet-http
protocol: TCP
port: 9763
targetPort: 9763
- name: servlet-https
protocol: TCP
port: 9443
targetPort: 9443
- name: binary
protocol: TCP
port: 9611
targetPort: 9611
- name: binary-secure
protocol: TCP
port: 9711
targetPort: 9711
- name: jms-tcp
protocol: TCP
port: 5672
targetPort: 5672
I dug up some other annotations realted to health probes here, but that doesn't seem to work or I don't understand, what it does.
service.beta.kubernetes.io/azure-load-balancer-health-probe-protocol
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path
Would love to have a way to disable load balancer health checks for specific ports, use case for example is GRPC Ports don't like when TCP things probe them and do not ask for the GRPC Preface. This causes a ton of Log flooding which is just noise.
Another valid option would be instead to allow the specific configuration to have a different healthcheck like a http healthcheck to the downstream service instead of checking the GRPC TCP port for availability.
Hi all I'm having the same issue. I'm hosting sftp server on AKS.
Putting LoadBalancer in front of a HTTP server as many have done above you need to be aware of following. The LoadBalancer health probe runs from each node in the cluster. It opens a TCP request, holds it open, sends nothing, and then waits for 15 seconds before closing the connection. I don't know if its the responsibility of the server or prober to close it faster, but most servers i seen it just occupies one thread. Meaning your server must concurrently be able to hold at least one connections open per node. Switching to some kind of asyncio server helps a lot, otherwise you need to increase the number of threads to match at least the number of nodes in your cluster.
A better solution is to consider a Ingress controller when dealing with HTTP.
For my infrastructure it requires multiple rules across different port, so if I need to have multiple copies of my microservices, I need multiple copies of ingress-controller for TCP forwarding. As a result, I turned into the Azure CNI provided LoadBalancer
type.
After that somehow the client was experiencing intermittent 502 BAD GATEWAY
for requests and I highly doubt it is related to the kubernetes
or kubernetes-internal
load balancer health probe misdetection underneath the Azure CNI. And I would like to rule out this possibility by disabling that.
I had the same problem and was able to disable the health probe for my sftp server port with this annotation:
From the docs: https://cloud-provider-azure.sigs.k8s.io/topics/loadbalancer/#loadbalancer-annotations
service.beta.kubernetes.io/port_{port}_no_probe_rule: true
Where{port}
must be replaced by the service port like service.beta.kubernetes.io/port_22_no_probe_rule
.
I think this issue can be closed as disabling health probes are already supported.
This was always the possibility or it was recently added as new function to AKS LB?
The doc states that it's possible since AKS Version v1.24. I don't know when the v1.24 Version was released.
Currently there is no way to disable the loadbalancer health probes. It would be good if an annotation could be add to allow to disable the health probes. Either globally or for some of the ports.