dask / dask-gateway

A multi-tenant server for securely deploying and managing Dask clusters.
https://gateway.dask.org/
BSD 3-Clause "New" or "Revised" License
137 stars 88 forks source link

Templatize the kubernetes resources that dask-gateway generates #389

Open costrouc opened 3 years ago

costrouc commented 3 years ago

For QHub we have moved away from using the dask-gateway helm chart to more tightly integrate dask-gateway with our traefik http/https/tcp proxy https://github.com/Quansight/qhub-terraform-modules/tree/main/modules/kubernetes/services/dask-gateway. This was mainly motivated when we found of that two traefik services in the same namespace in kubernetes do not play well with each other.

All that said the current issues we are facing is around decorating the IngressRoute and needing to add tls: {"certManager": "default"}. I would like to propose templatizing the resource objects being created via Traitlets.

For example

INGRESS_ROUTE_TEMPLATE = {
            "apiVersion": "traefik.containo.us/v1alpha1",
            "kind": "IngressRoute",
            "metadata": {
                "labels": "PLACEHOLDER",
                "annotations": "PLACEHOLDER",
                "name": "PLACEHOLDER",
            },
            "spec": {
                "entryPoints": "PLACEHOLDER",
                "routes": [
                    {
                        "kind": "Rule",
                        "match": "PLACEHOLDER",
                        "services": [
                            {
                                "name": "PLACEHOLDER",
                                "namespace": "PLACEHOLDER",
                                "port": 8787,
                            }
                        ],
                        "middlewares": "PLACEHOLDER",
                    }
                ],
            },
        }

Or possibly we should just make the make_ingressroute functions and similar overridable via traetlets callables. We need this functionality to expose the dask scheduler dashboard with https.

cc: @aktech

TomAugspurger commented 3 years ago

Thanks! I think it'd be great to remove the need for this workaround in QHub.

@droctothorpe or @consideRatio does this proposal sound sensible to you?

consideRatio commented 3 years ago

My understanding summarized

I'll number some of my thoughts as I consider this further.

Questions in my mind

Better understanding of the problem

  1. Is the issue you experience @costrouc caused by two Traefik controllers working against the same IngressRoute resource?
  2. Are you having issues both with the IngressRoute's created by the dask-gateway Helm chart and the dynamically created IngressRoute resources, or only by one of these?
  3. What kind of changes would you make to the template if you had it, in order to avoid the issue you experience? Answer: having tls.certManager=default for example.

Solution exploration

  1. I've seen the pattern of adding an annotation to k8s native Ingress resources to declare what controller should respond to them before. I don't think this is sufficient though as you also want to change for example tls.certMansger.
  2. I've seen use of k8s mutating webhooks that modifies resources before they are accepted to the k8s api-server, but I think it's overkill to suggest someone does that and I think its in scope to be to make some customizations.
  3. A merge strategy can be reasonable, but for example extraPodConfig. A downside is the complexity of making a change to an item in a list though, which is why KubeSpawner for example have extra_pod_config and extra_container_config separate from each other.
  4. A configurable template can be reasonable as well.
  5. Overriding the functions to generate the resources doesn't feel so robust to me at this point.

What do I think at the moment?

Hmmm... I think using a configurable template would be reasonable (7). Not very confident this is the right way to go, but it feels the most reasonable to explore in my mind.

When it comes to customizing the Helm charts declared k8s resource templates, I'd like to see an overview of:

With such insight, it would be reasonable to make a decision on how and if to support further configuration.

droctothorpe commented 3 years ago

@consideRatio's input covers most of the bases.

This was mainly motivated when we found of that two traefik services in the same namespace in kubernetes do not play well with each other.

Can you elaborate on the errors that you saw?

We need this functionality to expose the dask scheduler dashboard with https.

FWIW, we addressed this problem by terminating HTTPS at the ELB, which was as simple as adding the appropriate annotations to the Traefik service and ingress in the values yaml and letting cloud provider and external DNS work their magic.

dharhas commented 3 years ago

Can you elaborate on the errors that you saw?

For reference, this was the tracking issue for the errors we saw.

https://github.com/Quansight/qhub/issues/358