xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.
Fixes workload rendering when using spot, without this change xpk workload create errors like:
[XPK] Waiting for `Creating Workload`, for 0 seconds
error: error parsing /tmp/tmp242uhnfs: error converting YAML to JSON: yaml: line 33: could not find expected ':'
[XPK] Task: `Creating Workload` terminated with code `1`
Adds required pod tolerations when using node auto-provisioning with spot nodes. Without the tolerations cluster autoscaler will not create new spot node pools.
Testing / Documentation
Node auto-provisioning with spot
Created a xpk cluster with --spot flag.
Created a workload with a different topology than the cluster default.
Observed a nodepool being created with the new workload topology using spot TPU nodes.
Node auto-provisioning without spot
TODO
Not auto-provisioning
TODO
[ y/n ] Tests pass
[ y, not needed ] Appropriate changes to documentation are included in the PR
Fixes / Features
Fixes workload rendering when using spot, without this change xpk workload create errors like:
Adds required pod tolerations when using node auto-provisioning with spot nodes. Without the tolerations cluster autoscaler will not create new spot node pools.
Testing / Documentation
Node auto-provisioning with spot
--spot
flag.Node auto-provisioning without spot
TODO
Not auto-provisioning
TODO