dask / dask-kubernetes

Native Kubernetes integration for Dask
https://kubernetes.dask.org
BSD 3-Clause "New" or "Revised" License
312 stars 148 forks source link

Make more of the DaskCluster and DaskWorkerGroup specs optional #474

Open jacobtomlinson opened 2 years ago

jacobtomlinson commented 2 years ago

In #452 we merged new CRDs which are far more verbose and utilize the full pod and scheduler specs from Kubernetes. This is awesome because it enables users to have almost infinite flexibility when configuring pods and services. However, this does also give users far more rope and I'm nervous about how easy it is to misconfigure a cluster.

I'm totally sold on the verbosity of these manifests, it is the Kubernetes way and it is good to be explicit. However, I wonder if we can make more things optional to reduce the chance of error (like many settings in Pod are optional).

For example in the docs our example cluster looks like this:

apiVersion: kubernetes.dask.org/v1
kind: DaskCluster
metadata:
  name: simple-cluster
spec:
  worker:
    replicas: 2
    spec:
      containers:
      - name: worker
        image: "ghcr.io/dask/dask:latest"
        imagePullPolicy: "IfNotPresent"
        args:
          - dask-worker
          # Note the name of the cluster service, which adds "-service" to the end
          - tcp://simple-cluster-service.default.svc.cluster.local:8786
  scheduler:
    spec:
      containers:
      - name: scheduler
        image: "ghcr.io/dask/dask:latest"
        imagePullPolicy: "IfNotPresent"
        args:
          - dask-scheduler
        ports:
          - name: comm
            containerPort: 8786
            protocol: TCP
          - name: dashboard
            containerPort: 8787
            protocol: TCP
        readinessProbe:
          tcpSocket:
            port: comm
            initialDelaySeconds: 5
            periodSeconds: 10
        livenessProbe:
          tcpSocket:
            port: comm
            initialDelaySeconds: 15
            periodSeconds: 20
    service:
      type: NodePort
      selector:
        dask.org/cluster-name: simple-cluster
        dask.org/component: scheduler
      ports:
      - name: comm
        protocol: TCP
        port: 8786
        targetPort: "comm"
      - name: dashboard
        protocol: TCP
        port: 8787
        targetPort: "dashboard"

In the majority of cases the scheduler config is going to be identical except for the dask.org/cluster-name selector which just needs to match the metadata.name set at the top. Of the users that will want to configure this I expect the majority of those will only want to change the service, not the pod.

I think we could make the following sections optional and if they are omitted just set them to the values from the example above in the operator:

The other thing I notice is that the worker pod needs to be given the full name of the service as a CLI argument. That argument can also be set as an environment variable called DASK_SCHEDULER_ADDRESS so you just need to run dask-worker. I would also like to propose that we inject the DASK_SCHEDULER_ADDRESS env var into all containers within the pod spec so that this can be omitted in the majority of cases.

Examples

The result of these changes would mean that the most minimal cluster you could configure would be:

apiVersion: kubernetes.dask.org/v1
kind: DaskCluster
metadata:
  name: simple-cluster
spec:
  worker:
    replicas: 2

If you want to customise the worker pods (the most common customization) then you include that section and it would look like this:

apiVersion: kubernetes.dask.org/v1
kind: DaskCluster
metadata:
  name: simple-cluster
spec:
  worker:
    replicas: 2
    spec:
      containers:
      - name: worker
        image: "ghcr.io/dask/dask:latest"
        args:
          - dask-worker

The next most common customisation I see is making the service some other type like a LoadBalancer or a NodePort. I think it would be good to insist that the user provides the full service spec, so they would need to specify the ports and things as they do now. The optional things in this proposal are all or nothing. The service would look like this:

apiVersion: kubernetes.dask.org/v1
kind: DaskCluster
metadata:
  name: simple-cluster
spec:
  worker:
    replicas: 2
  scheduler:
    service:
      type: LoadBalancer
      selector:
        dask.org/cluster-name: simple-cluster
        dask.org/component: scheduler
      ports:
      - name: comm
        protocol: TCP
        port: 8786
        targetPort: "comm"
      - name: dashboard
        protocol: TCP
        port: 8787
        targetPort: "dashboard"

The other customisation that comes up is the scheduler itself. Generally if folks want to change the worker Docker image they need to change the scheduler one too. Following the all or nothing approach would mean the scheduler would require this:

apiVersion: kubernetes.dask.org/v1
kind: DaskCluster
metadata:
  name: simple-cluster
spec:
  worker:
    replicas: 2
  scheduler:
    spec:
      containers:
      - name: scheduler
        image: "ghcr.io/dask/dask:latest"
        imagePullPolicy: "IfNotPresent"
        args:
          - dask-scheduler
        ports:
          - name: comm
            containerPort: 8786
            protocol: TCP
          - name: dashboard
            containerPort: 8787
            protocol: TCP
        readinessProbe:
          tcpSocket:
            port: comm
            initialDelaySeconds: 5
            periodSeconds: 10
        livenessProbe:
          tcpSocket:
            port: comm
            initialDelaySeconds: 15
            periodSeconds: 20

Or of course, you could specify all of these sections and you would end up with the example we have today.

Implementation

To implement this I propose the following:

@samdyzon I would like to get your feedback on this proposal before I dive into implementation.

jacobtomlinson commented 2 years ago

There is a part of me that wants to explore partially optional sections that update a default. For example to change the scheduler image you have to redefine the whole ports and probes sections which is a bit cumbersome and I wonder if we could avoid that.

But perhaps the way to rationalise this is to say if you want simplicity when changing the Docker image then you should use the Python API.

from dask_kubernetes.experimental import KubeCluster

cluster = KubeCluster(name="foo", image="me/myawesomeimage:latest")
samdyzon commented 2 years ago

I'm totally sold on the verbosity of these manifests, it is the Kubernetes way and it is good to be explicit. However, I wonder if we can make more things optional to reduce the chance of error (like many settings in Pod are optional).

I definitely understand the desire to reduce the possibility of errors in this space, and at the end of the day it's your project and totally up to you. My reservations about this are mostly around the hidden "business logic" that this would require to be embedded in the operator. Aside from having to maintain and keep track of the default values, your operator now needs to do more than one thing, which means more code to test/maintain. Personally, I think the developers who are using this software (and the k8s side specifically) can be trusted to make the right decisions when providing the resources they need.

If you do decide to go ahead with this, I would recommend making the documentation clear that there is some behind-the-scenes logic that may affect the resources the operator creates.

  • When constructing the worker pod iterate over each container and append the DASK_SCHEDULER_ADDRESS env var.

I didn't realise that this env var was available, and I reckon that would be really useful - plus its still pretty idiomatic for pods to have env vars attached to them from various sources in k8s (EKS will append env vars based on service accounts, for example).

But perhaps the way to rationalise this is to say if you want simplicity when changing the Docker image then you should use the Python API.

I think this sums up my opinion overall - if you don't want the flexibility of the k8s manifests and you don't need to write YAML then you should use the Python API instead.

jacobtomlinson commented 2 years ago

I really appreciate your input here @samdyzon. I think I'm going to go ahead and take some steps in this direction, but I hear your reservations and will aim to keep things as light touch as possible.