dask / dask-gateway

A multi-tenant server for securely deploying and managing Dask clusters.
https://gateway.dask.org/
BSD 3-Clause "New" or "Revised" License
136 stars 88 forks source link

How to Use TLS in Dask-Gateway? #344

Open rileyhun opened 3 years ago

rileyhun commented 3 years ago

I understand that it is recommended to use TLS in a production environment as per the docs, so I'm trying to set that up. Here are the steps I followed for my attempt at doing this:

[1] I added the paths to the self-signed certificate and key files in a Dockerfile [2] Pushed that Docker Image to Google Cloud Image Repository [3] Replace image names from "daskgateway/dask-gateway-server" to the image in Google Cloud Image Repo in the helm config file [4] Added the paths to the self-signed key and cert files in the extraConfig field

What happened: Nothing changed. No errors. The internal load balancer for the traefik proxy server still using HTTP.

What you expected to happen: I expected the internal load balancer to use HTTPS

Minimal Complete Verifiable Example:

Dockerfile:

FROM daskgateway/dask-gateway-server:latest

ADD certs /certs/

Helm Config:

extraConfig:
    security: |
      c.Proxy.tls_cert = "certs/myca.pem"
      c.Proxy.tls_key = "certs/mykey.pem"
    clusteroptions: |
      from dask_gateway_server.options import Options, Integer, Float, String

      c.KubeClusterConfig.idle_timeout = 3600

      def option_handler(options):
        return {
        "worker_cores": options.worker_cores,
        "worker_memory": "%fG" % options.worker_memory,
        "image": options.image,
        }

      c.Backend.cluster_options = Options(
        Integer("worker_cores", 2, min=1, max=8, label="Worker Cores"),
        Float("worker_memory", 4, min=1, max=16, label="Worker Memory (GiB)"),
        String("image", default="daskgateway/dask-gateway:latest", label="Image"),
        handler=option_handler,
      )

Environment:

jcrist commented 3 years ago

The c.Proxy.* settings don't affect k8s users, as the proxy used on k8s is different (we should update our docs to better reflect this). Currently we don't expose configuring HTTPS for the traefik proxy - most users run with JupyterHub and piggyback on JupyterHub's TLS proxy by registering dask-gateway as a JupyterHub service. This obviously doesn't help users not running with JupyterHub.

There's a few ways we could enable configuring TLS for use with k8s. I'd probably mimic how JupyterHub exposes things:

The first two are the easiest to setup, just require some helm chart munging.

rileyhun commented 3 years ago

The c.Proxy.* settings don't affect k8s users, as the proxy used on k8s is different (we should update our docs to better reflect this). Currently we don't expose configuring HTTPS for the traefik proxy - most users run with JupyterHub and piggyback on JupyterHub's TLS proxy by registering dask-gateway as a JupyterHub service. This obviously doesn't help users not running with JupyterHub.

Thanks Jim,

This is very helpful information.

I should clarify that I was going to set-up Dask Gateway as a service of JupyterHub because that simplifies a lot of things from an authentication perspective, but we also have AI Platform Notebooks which our users are used to, so using JupyterHub seems kind of redundant and would most likely confuse them.

cdibble commented 3 years ago

I've got a similar use case with Dask Gateway, k8s, and jupyterhub. My k8s is deployed on AWS EKS.

I've set up Dask Gateway using a Jupyterhub api token for auth. My JupyterHub deployment uses tls and is behind a VPN. Everything with Dask Gateway deployed via the helm chart works great.

However (re: this thread), I've found I can't figure out how to set up tls certificates for the Dask dashboard that is generated when clusters are deployed. Additionally, traffic to those dashboards ends up not being protected by the VPN.

I can use the ClusterIP traefik service type (see #304 ), but then I can't get to Dask Dashboards at all.

Ideally, I'd like to have both tls and use an internal IP so that I can still get Dask dashboards but they'll only be exposed to internal network traffic. One of two wouldn't be bad either.

Is this possible by configuring, perhaps gateway.backend.scheduler/extraPodConfig? I've tried setting the following annotations on various service in the values.yaml:


service:
    annotations: #{}
      kubernetes.io/ingress.class: alb
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/scheme: internal

Any thoughts or direction would be appreciated and I'd be happy to include more info. I'm not sure this warrants a new Issue, but if so, I'll open one.

droctothorpe commented 3 years ago

Easy fix: use annotations + cloud provider to provision an ELB for Dask Gateway's Traefik service, and in the annotations configure that ELB to terminate HTTPS. Granted, requests are unencrypted past the ELB, but the important stuff (client to scheduler comms) relies on mTLS and is encrypted end to end anyway.

droctothorpe commented 3 years ago

Also, all of the REST traffic can stay in cluster (instead of routing out through the ELB) if you leverage the K8s DNS names for the services. The dashboard will be the only exception (you'll need to pass the public-address keyword to the gateway object or the client will display a widget with the internal, inaccessible URL) but it will use https.

cdibble commented 3 years ago

Thanks @droctothorpe for the suggestions. I'll see if I can get the ELB to use HTTPS termination- looks pretty straightforward if these docs are enough.

As for your second comment- would you mind clarifying? I'm not sure how to take advantage of k8s DNS names to ensure that the dashboard is the only service exposed. Right now, the only service with an External-IP is the dask-gateway traefik service, so that would lead me to think that all other traffic is already limited to internal paths. But if there is some further configuration that can ensure this, I'd definitely want to implement that. Any links, resources, or suggestions are always appreciated.

droctothorpe commented 3 years ago

The other services will be exposed (via the same ELB that serves the dashboard). The advantage of using the K8s DNS names is that requests from the in-cluster Dask Gateway clients to the Gateway API don't leave the cluster for no reason. It also helps with environment file consistency as well if you're provisioning to multiple discrete environments. It's a nice to have but not strictly necessary and kind of a tangent from your original question, heh.

cdibble commented 3 years ago

Thanks for the tips! I'm still pretty new to kubernetes and looking into the DNS names has been edifying.

I wanted to post this snippet as reference for others. I was able to hide Dask Gateway, including the Dask Dashboards, behind my VPN using the following annotations on the traefik service. Note that these annotations are specific to AWS EKS with AWS Elastic Load Balancers (and AWS Load Balancer Controller as the Ingress Controller), but I'd think there are similar methods with other load balancers.

traefik:
  service:
    type: LoadBalancer # Use LoadBalancer if you want internet-facing ingress.
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-type: alb
      service.beta.kubernetes.io/aws-load-balancer-internal: <CIDR-block-for-local-VPC-traffic>

[edit- fixed indentations]