xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.
Fixes workload rendering when using spot, without this change xpk workload create errors like:
[XPK] Waiting for `Creating Workload`, for 0 seconds
error: error parsing /tmp/tmp242uhnfs: error converting YAML to JSON: yaml: line 33: could not find expected ':'
[XPK] Task: `Creating Workload` terminated with code `1`
Adds required pod tolerations when using node auto-provisioning with spot nodes. Without the tolerations cluster autoscaler will not create new spot node pools.
Testing / Documentation
[ y ] Tests pass
[ y, not needed ] Appropriate changes to documentation are included in the PR
Node auto-provisioning with spot
Created a xpk cluster with --spot and autoprovisioning flags.
Created a workload with a different topology than the cluster default.
Observed a nodepool being created with the new workload topology using spot TPU nodes.
Node auto-provisioning without spot
Created a xpk cluster with --spot and autoprovisioning flags.
Created a workload with a different topology than the cluster default and --on-demand flag.
Validated generated YAML does not specify spot node-selector and tolerations
Observed a nodepool being created with the new workload topology using on-demand TPU nodes.
zone: 'us-central2'>] finished with error: Try a different location, or try again later: Google Compute Engine does not have enough resources available to fulfill request: us-central2-b
Fixes / Features
Fixes workload rendering when using spot, without this change xpk workload create errors like:
Adds required pod tolerations when using node auto-provisioning with spot nodes. Without the tolerations cluster autoscaler will not create new spot node pools.
Testing / Documentation
Node auto-provisioning with spot
--spot
and autoprovisioning flags.Node auto-provisioning without spot
--spot
and autoprovisioning flags.--on-demand
flag.Not auto-provisioning with spot
--spot
flag.