AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine
Apache License 2.0
194
stars
143
forks
source link
TPU provisioner should be configurable to stop new nodepool create #661
In the use case of configuring TPU provisioners with multiple clusters and/or potentially version breaking changes, it would be helpful to have a configuration setting to allow the TPU provisioner to run and cleanup, but not get triggered to create anymore nodepools.
Examples:
Migrating from pod trigger to jobset trigger for nodepool creations and running multiple iterations.
TPU provisioner needs to run on a cluster to cleanup.
Temporarily stop the provisioner from creating nodepools, while cleaning up existing nodepools to create capacity.
In the use case of configuring TPU provisioners with multiple clusters and/or potentially version breaking changes, it would be helpful to have a configuration setting to allow the TPU provisioner to run and cleanup, but not get triggered to create anymore nodepools.
Examples: