dask / dask-kubernetes

Native Kubernetes integration for Dask
https://kubernetes.dask.org
BSD 3-Clause "New" or "Revised" License
312 stars 148 forks source link

Migrate CRD generation to Python dataclasses #900

Open jacobtomlinson opened 3 months ago

jacobtomlinson commented 3 months ago

The Dask operator has a number of Custom Resource Definitions. Many of these definitions have specs from other resources nested within them. For example the scheduler and worker parameters within DaskCluster have nested PodSpec specifications because these will ultimately be created as Pods by the controller.

https://github.com/dask/dask-kubernetes/blob/734d001836fad228cca809093feec3d07cd634a6/dask_kubernetes/operator/customresources/templates.yaml#L36

To generate the Custom Resource Definitions that we ship in the Dask Kubernetes Operator Helm Chart we have a whole pile of yaml that contains the definition templates. Then we have a pre-commit hook that runs k8s-crd-resolver which renders the templates into the helm chart crds directory.

I'm conscious that k8s-crd-resolver appears to be unmaintained and has not had new schema definitions introduced since Kubernetes 1.25 which went EOL in October 2023. I also opened https://github.com/elemental-lf/k8s-crd-resolver/issues/6 over a year ago which has not had any response.

It could be interesting to explore migrating the CRD generation to Python Dataclasses instead of YAML templates. There is a library that can do this called kubecrd but this also appears to be unmaintained. It also does not support having multiple version schemas and so would stop us from impementing #753.

As a response to this I'm exploring generating Custom Resource Definition objects in kr8s from dataclasses (https://github.com/kr8s-org/kr8s/issues/456). Putting this code in kr8s will help ensure it is tested and maintained. It would also allow you to create multiple dataclasses, one for each version of the schema.