Open Collinbrown95 opened 2 years ago
Assuming that each user spins up its own dask cluster, provisioning the dask cluster using the KubeCluster cluster manager is the most convenient. However, it has its limitations for multiple users / scripts wanting access to the same cluster as the primary focus is geared towards ad-hoc deployments. Dask-gateway seems the right candidate for multiple users.
It deploys a dask cluster on the K8s cluster using the KubeCluster cluster manager. Requires RBAC configs on the service-account of the jupyter pod. It creates a service for the scheduler pod and can spin up / tear down worker pods adaptively.
cluster.scheduler_address
to extract the scheduler address and then use the dask distributed client to connect to it, the connection is quite unstable and results in a lot of warnings. Also closing the client would only terminate the scheduler pod and since the scheduler pod is not available the worker pods will automatically terminate, however, other resources like services and poddisruptionbudgets would still be up.Allows managing dask cluster as K8s resources. It requires creating dask CRDs and the operator which looks for and manages the CRDs.
Provides multi-tenant server for managing dask-clusters in a centralized fashion. Users don't need access to the underlying cluster backend. Installs daskclusters, traefik proxy, gateway API server, and gateway controller.
CC: @chritter
Overview
There are a number of project use cases that are well-tailored to using Dask.
There is a Kubernetes operator for Dask that creates + manages dask clusters from
DaskCluster
custom resources.I have started exploring a proof of technology in https://github.com/Collinbrown95/pot-k8s-dask looking at how namespaced dask clusters might work. Perhaps it makes sense to follow a pattern similar to the work done with Gitea or S3Proxy controllers in https://github.com/StatCan/aaw-kubeflow-profiles-controller (i.e. have a controller that deploys
DaskCluster
resources to individuals who opt in for Dask functionality)?I'll update this issue with a more concrete proposal once I've further fleshed out details in the linked repo.