Open jacobtomlinson opened 2 years ago
There is a part of me that wants to explore partially optional sections that update a default. For example to change the scheduler image you have to redefine the whole ports and probes sections which is a bit cumbersome and I wonder if we could avoid that.
But perhaps the way to rationalise this is to say if you want simplicity when changing the Docker image then you should use the Python API.
from dask_kubernetes.experimental import KubeCluster
cluster = KubeCluster(name="foo", image="me/myawesomeimage:latest")
I'm totally sold on the verbosity of these manifests, it is the Kubernetes way and it is good to be explicit. However, I wonder if we can make more things optional to reduce the chance of error (like many settings in Pod are optional).
I definitely understand the desire to reduce the possibility of errors in this space, and at the end of the day it's your project and totally up to you. My reservations about this are mostly around the hidden "business logic" that this would require to be embedded in the operator. Aside from having to maintain and keep track of the default values, your operator now needs to do more than one thing, which means more code to test/maintain. Personally, I think the developers who are using this software (and the k8s side specifically) can be trusted to make the right decisions when providing the resources they need.
If you do decide to go ahead with this, I would recommend making the documentation clear that there is some behind-the-scenes logic that may affect the resources the operator creates.
- When constructing the worker pod iterate over each container and append the DASK_SCHEDULER_ADDRESS env var.
I didn't realise that this env var was available, and I reckon that would be really useful - plus its still pretty idiomatic for pods to have env vars attached to them from various sources in k8s (EKS will append env vars based on service accounts, for example).
But perhaps the way to rationalise this is to say if you want simplicity when changing the Docker image then you should use the Python API.
I think this sums up my opinion overall - if you don't want the flexibility of the k8s manifests and you don't need to write YAML then you should use the Python API instead.
I really appreciate your input here @samdyzon. I think I'm going to go ahead and take some steps in this direction, but I hear your reservations and will aim to keep things as light touch as possible.
In #452 we merged new CRDs which are far more verbose and utilize the full
pod
andscheduler
specs from Kubernetes. This is awesome because it enables users to have almost infinite flexibility when configuring pods and services. However, this does also give users far more rope and I'm nervous about how easy it is to misconfigure a cluster.I'm totally sold on the verbosity of these manifests, it is the Kubernetes way and it is good to be explicit. However, I wonder if we can make more things optional to reduce the chance of error (like many settings in
Pod
are optional).For example in the docs our example cluster looks like this:
In the majority of cases the scheduler config is going to be identical except for the
dask.org/cluster-name
selector which just needs to match themetadata.name
set at the top. Of the users that will want to configure this I expect the majority of those will only want to change the service, not the pod.I think we could make the following sections optional and if they are omitted just set them to the values from the example above in the operator:
spec.worker.spec
spec.scheduler
spec.scheduler.service
spec.scheduler.spec
The other thing I notice is that the worker pod needs to be given the full name of the service as a CLI argument. That argument can also be set as an environment variable called
DASK_SCHEDULER_ADDRESS
so you just need to rundask-worker
. I would also like to propose that we inject theDASK_SCHEDULER_ADDRESS
env var into all containers within the pod spec so that this can be omitted in the majority of cases.Examples
The result of these changes would mean that the most minimal cluster you could configure would be:
If you want to customise the worker pods (the most common customization) then you include that section and it would look like this:
The next most common customisation I see is making the service some other type like a
LoadBalancer
or aNodePort
. I think it would be good to insist that the user provides the full service spec, so they would need to specify the ports and things as they do now. The optional things in this proposal are all or nothing. The service would look like this:The other customisation that comes up is the scheduler itself. Generally if folks want to change the worker Docker image they need to change the scheduler one too. Following the all or nothing approach would mean the scheduler would require this:
Or of course, you could specify all of these sections and you would end up with the example we have today.
Implementation
To implement this I propose the following:
DASK_SCHEDULER_ADDRESS
env var.@samdyzon I would like to get your feedback on this proposal before I dive into implementation.