AI-Hypercomputer / xpk

xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.
Apache License 2.0
81 stars 23 forks source link

Fix autoprovisioning with spot nodes #186

Closed avrittrohwer closed 1 month ago

avrittrohwer commented 1 month ago

Fixes / Features

Testing / Documentation

Node auto-provisioning with spot

  1. Created a xpk cluster with --spot flag.
  2. Created a workload with a different topology than the cluster default.
  3. Observed a nodepool being created with the new workload topology using spot TPU nodes.

Node auto-provisioning without spot

TODO

Not auto-provisioning

TODO

avrittrohwer commented 1 month ago

https://github.com/google/xpk/pull/187