dask / dask-gateway

A multi-tenant server for securely deploying and managing Dask clusters.
https://gateway.dask.org/
BSD 3-Clause "New" or "Revised" License
135 stars 88 forks source link

The readinessProbe for traefik pod (1 replica) is too sensitive #390

Open consideRatio opened 3 years ago

consideRatio commented 3 years ago

I have reasoned a lot about readinessProbe and livenessProbes for the JupyterHub Helm chart. There is one conclusion that I'd like to convey and suggest for this Helm chart.

For details about those past experiences, see https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/1941.

My conclusion from previous experiences

All k8s Deployment resources that only have a single Pod replica, should if they have a readinessProbe at all, have the failureThreshold set to something very high so that the livenessProbe always triggers before the readinessProbe.

The specific situation

I experienced a lot of network issues that I struggled to track down, but when I did I noticed that it was because the readinessProbe of the traefik pod were flaking from time to time. Whenever that happened, the traefik pod became UnReady and wouldn't receive traffic from the associated k8s Service. As there were only 1 replica of the traefik instance, it meant any traffic sent to the traefik k8s Service would fail.

The outcome was intermittent network issues. With a single replica traefik like I've used (the default), I suggest we increase the readinessProbe failureThreshold to 1000 or similar, or disabling it entirely. I'm not confident if the traefik instance supports running in multiple replicas or so, but if it doesn't that at least make sense.

Action point

consideRatio commented 3 years ago

Tech details

Traefik pod

Hardcoded readiness / liveness probes

https://github.com/dask/dask-gateway/blob/7d1659db8b2122d1a861ea820a459fd045fc3f02/resources/helm/dask-gateway/templates/traefik/deployment.yaml#L68-L85

Api pod (gateway)

Configurable readiness / liveness probes

https://github.com/dask/dask-gateway/blob/7d1659db8b2122d1a861ea820a459fd045fc3f02/resources/helm/dask-gateway/templates/gateway/deployment.yaml#L66-L86

Controller pod

No readiness / liveness probes

https://github.com/dask/dask-gateway/blob/7d1659db8b2122d1a861ea820a459fd045fc3f02/resources/helm/dask-gateway/templates/controller/deployment.yaml#L26-L70

droctothorpe commented 3 years ago

Clarify the ability or inability for the api pod, traefik pod, and controller pod are able to run with multiple replicas.

All three support HA. We deploy two of each in our cluster. The JH issue you linked is a direct result of hub's lack of HA support.

DG's approach to HA (which is a direct result of its ability to toggle between managing its own database and lettings its target backend handle state) is a profound step forward vis a vis JH and might be able to serve as a point of reference for how to achieve a similar outcome for JH. Here's the relevant PR: https://github.com/dask/dask-gateway/pull/195.

That being said, your suggested implementation could still be useful for people who don't want HA for some reason (to conserve cluster resources, for example).

consideRatio commented 3 years ago

Thanks for clarifying that all pods have HA support.

Given that traefik is truly HA, a sensitive readinessProbe can make sense, but I reason that it should perhaps default to having two replicas of that deployment so the default configuration isn't problematic and cause at least a 10 second disruption of all network traffic whenever a single /ping request fail.

Concrete suggestion ideas:

droctothorpe commented 3 years ago

Np! Jim did all the actual work on this so take my answers with a grain of salt.

Do you know if the controller use some kind of leader election or similar?

I just logged onto my work computer to double-check this and realized that we're only deploying one controller pod. I'm sorry for the confusion!

Not seeing anything in the controller logic about leader election (which is how Operator SDK and Kubebuilder handle this). Unless I'm missing something (which, clearly, wouldn't be a first, heh), that could be a great candidate for an upstream contribution. I'll file an issue to that effect. It would be something fun for my team to take a swing at if the core maintainers think it's a good idea.

At present, it looks like the controller doesn't leverage probes, which alleviates the concern about HA settings being applied to a singleton component.

I see the Traefik probe settings are hard-coded in the template. It looks like those are just inherited from the upstream.

Obviously, defaulting the Traefik and API Gateway replica counts to 2 in the values yaml is probably the path of least resistance, but I like your proposal for implementing logic to toggle between the two approaches.