Azure / kaito

Kubernetes AI Toolchain Operator
MIT License
347 stars 34 forks source link

Kaito workspace-controller support for Karpenter nodeclaim #339

Open philwelz opened 3 months ago

philwelz commented 3 months ago

Describe the bug

Kaito workspace controller seems to be only compatible with Karpenter version prior v0.33.0 as they deprecated the machine CRD (karpenter.sh/v1alpha5) in that release and the controller seems to relay on that machine CRD for spinning up the workspace.

Steps To Reproduce

  1. Create an AKS cluster with NAP enabled
  2. Install Kaito workspace-controller helm install workspace kaito/workspace --namespace workspace --create-namespace
  3. Add an workspace kubectl apply -f https://raw.githubusercontent.com/Azure/kaito/main/examples/inference/kaito_workspace_phi-2.yaml
  4. Nothing happens, run kubectl api-resouces and see there is no CRD machines karpenter.sh/v1alpha5

Expected behavior

As Kaito per docs supports node provisioning controllers that supports Karpenter-core APIs, it should also support the new Karpenter CRDs for nodeClaims (machine CRD was deprecated in December). In best-case scenario Kaito (workspace-controller) should also run on AKS with NAP enabled and should be aware of the Karpenter version (or which CRD is available: machines vs nodeclaim).

Logs Controller starts and prints this:

2024-04-04T10:07:49Z INFO Starting EventSource {"controller": "workspace", "controllerGroup": "kaito.sh", "controllerKind": "Workspace", "source": "kind source: *v1alpha5.Machine"}

Environment

### Support Karpenter Tasks
- [x] Add NodeClaim API #362
- [x] Update workspace controller to support NodeClaim #366
- [ ] Add e2e tests and support karpenter in the workflow #375
- [ ] Update doc
Fei-Guo commented 3 months ago

Thanks for filing the issue. We are working towards supporting Karpenter new APIs.