dask / dask-gateway

A multi-tenant server for securely deploying and managing Dask clusters.
https://gateway.dask.org/
BSD 3-Clause "New" or "Revised" License
137 stars 88 forks source link

Can dask-gateway work with JupyterHub service prefix? #350

Closed rcthomas closed 2 years ago

rcthomas commented 3 years ago

From the documentation I see that dask-gateway can leverage JupyterHub authentication. I see it also mentioned that dask-gateway can run as a Hub service, but I haven't seen in the docs or issues anything about configuring the service prefix? In particular,

https://jupyterhub.readthedocs.io/en/stable/reference/services.html#writing-your-own-services

"When you run a service that has a url, it will be accessible under a /services/ prefix, such as https://myhub.horse/services/my-service/. For your service to route proxied requests properly, it must take JUPYTERHUB_SERVICE_PREFIX into account when routing requests. For example, a web service would normally service its root handler at '/', but the proxied service would need to serve JUPYTERHUB_SERVICE_PREFIX."

Is that something that would need to be added? Is there an easy way to prefix all the routes (or is it more complicated than that)?

TomAugspurger commented 3 years ago

Setting gateway.prefix should do the trick, as in https://github.com/dask/helm-chart/blob/master/daskhub/values.yaml#L59-L64 (that's all under a dask-gateway key, since the daskhub helm chart is wrapping dask-gateway's chart). Does that answer the question?

Edit: re-reading, perhaps it doesn't answer your question, which seems to be about the service prefix, not the gateway.

rcthomas commented 3 years ago

My original thinking was that I have CHP already running because Hub, and if I'm going to run dask-gateway behind it why do I need another proxy. But if the proxy is how you're prefixing the routes, maybe I should just try it.

consideRatio commented 3 years ago

I can confirm that it works well thanks to gateway.prefix, and by registering a service with JupyterHub. Below are snippets from a config that does so and where everything works out well. For details, see https://github.com/2i2c-org/pilot-hubs/blob/a43fe00bc2a8c11fc1afcb7d7c271a8a38f64cb0/hub-templates/daskhub/values.yaml#L87-L122.

Okay to close this issue?

  jupyterhub:
    hub:
      extraEnv:
        # About DASK_ prefixed variables we set:
        #
        # 1. k8s native variable expansion is applied with $(MY_ENV) syntax. The
        #    order variables are defined matters though and we are under the
        #    mercy of how KubeSpawner renders our passed dictionaries.
        #
        # 2. Dask loads local YAML config.
        #
        # 3. Dask loads environment variables prefixed DASK_.
        #    - DASK_ is stripped
        #    - Capitalization is ignored
        #    - Double underscore means a nested configuration
        #    - `ast.literal_eval` is used to parse values
        #
        # 4. dask-gateway and dask-distributed looks at its config and expands
        #    expressions in {} again, sometimes only with the environment
        #    variables as context but sometimes also with additional variables.
        #
        # References:
        # - K8s expansion:     https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/
        # - KubeSpawner issue: https://github.com/jupyterhub/kubespawner/issues/491
        # - Dask config:       https://docs.dask.org/en/latest/configuration.html
        # - Exploration issue: https://github.com/2i2c-org/pilot-hubs/issues/442
        #
        DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: '{JUPYTER_IMAGE_SPEC}'
        DASK_GATEWAY__CLUSTER__OPTIONS__ENVIRONMENT: '{"SCRATCH_BUCKET": "$(SCRATCH_BUCKET)"}'
        DASK_DISTRIBUTED__DASHBOARD_LINK: '/user/{JUPYTERHUB_USER}/proxy/{port}/status'
        DASK_LABEXTENSION__FACTORY__MODULE: 'dask_gateway'
        DASK_LABEXTENSION__FACTORY__CLASS: 'GatewayCluster'

    hub:
      networkPolicy:
        enabled: false
      extraConfig:
        daskhub-01-add-dask-gateway-values: |
          # 1. Sets `DASK_GATEWAY__PROXY_ADDRESS` in the singleuser environment.
          # 2. Adds the URL for the Dask Gateway JupyterHub service.
          import os
          # These are set by jupyterhub.
          release_name = os.environ["HELM_RELEASE_NAME"]
          release_namespace = os.environ["POD_NAMESPACE"]
          if "PROXY_HTTP_SERVICE_HOST" in os.environ:
              # https is enabled, we want to use the internal http service.
              gateway_address = "http://{}:{}/services/dask-gateway/".format(
                  os.environ["PROXY_HTTP_SERVICE_HOST"],
                  os.environ["PROXY_HTTP_SERVICE_PORT"],
              )
              print("Setting DASK_GATEWAY__ADDRESS {} from HTTP service".format(gateway_address))
          else:
              gateway_address = "http://proxy-public/services/dask-gateway"
              print("Setting DASK_GATEWAY__ADDRESS {}".format(gateway_address))
          # Internal address to connect to the Dask Gateway.
          c.KubeSpawner.environment.setdefault("DASK_GATEWAY__ADDRESS", gateway_address)
          # Internal address for the Dask Gateway proxy.
          c.KubeSpawner.environment.setdefault("DASK_GATEWAY__PROXY_ADDRESS", "gateway://traefik-{}-dask-gateway.{}:80".format(release_name, release_namespace))
          # Relative address for the dashboard link.
          c.KubeSpawner.environment.setdefault("DASK_GATEWAY__PUBLIC_ADDRESS", "/services/dask-gateway/")
          # Use JupyterHub to authenticate with Dask Gateway.
          c.KubeSpawner.environment.setdefault("DASK_GATEWAY__AUTH__TYPE", "jupyterhub")
          # Adds Dask Gateway as a JupyterHub service to make the gateway available at
          # {HUB_URL}/services/dask-gateway
          service_url = "http://traefik-{}-dask-gateway.{}".format(release_name, release_namespace)
          for service in c.JupyterHub.services:
              if service["name"] == "dask-gateway":
                  if not service.get("url", None):
                      print("Adding dask-gateway service URL")
                      service.setdefault("url", service_url)
                  break
          else:
              print("dask-gateway service not found. Did you set jupyterhub.hub.services.dask-gateway.apiToken?")

dask-gateway:
  gateway:
    prefix: "/services/dask-gateway"  # Users connect to the Gateway through the JupyterHub service.
    auth:
      type: jupyterhub  # Use JupyterHub to authenticate with Dask Gateway
consideRatio commented 2 years ago

I'll go for a close at this point, this works!