xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.
Apache License 2.0
81
stars
23
forks
source link
Check cluster arguments and update nodepools in existing cluster when requesting different device_type #120
Do not update nodepools in existing cluster when args.zone is different from the zone in which the cluster was initially created
Delete nodepools in existing cluster and create new nodepools when requesting cluster to update with different device_type
Testing / Documentation
Scenario 1:
Cluster already has 4 nodepools of v4-8 in us-central2-b and now we request 2 nodepools of v4-8 in us-central2-b. The end state of the cluster will be 2 nodepools of v4-8 in us-central2-b.
Scneraio 2:
Cluster has 2 nodepools of v4-8 in us-central2-b and now we request 4 nodepools of v4-8 in us-central2-b. The end state of the cluster will be 4 nodepools of v4-8 in us-central2-b.
Scenario 3:
Cluster already has 2 nodepools of v4-8 in us-central2-b and now we request 2 nodepools of v4-16 in us-central2-b. The end state of the cluster will be 2 nodepools of v4-16 in us-central2-b.
Scenario 4:
Cluster already has 2 nodepools of v4-8 in us-central2-b and now we request 3 nodepools of v4-16 in us-central2-b. The end state of the cluster will be 3 nodepools of v4-16 in us-central2-b.
Scenario 5:
Cluster already has 2 nodepools of v4-8 in us-central2-b and now we request 2 nodepools of v4-8 in us-central2-a. XPK will fail early and will not update the cluster. The end state of the cluster will be 2 nodepools of v4-8 in us-central2-b.
[ y ] Tests pass
[ y ] Appropriate changes to documentation are included in the PR
Fixes / Features
args.zone
is different from the zone in which the cluster was initially createddevice_type
Testing / Documentation
Scenario 1: Cluster already has 4 nodepools of v4-8 in us-central2-b and now we request 2 nodepools of v4-8 in us-central2-b. The end state of the cluster will be 2 nodepools of v4-8 in us-central2-b.
Scneraio 2: Cluster has 2 nodepools of v4-8 in us-central2-b and now we request 4 nodepools of v4-8 in us-central2-b. The end state of the cluster will be 4 nodepools of v4-8 in us-central2-b.
Scenario 3: Cluster already has 2 nodepools of v4-8 in us-central2-b and now we request 2 nodepools of v4-16 in us-central2-b. The end state of the cluster will be 2 nodepools of v4-16 in us-central2-b.
Scenario 4: Cluster already has 2 nodepools of v4-8 in us-central2-b and now we request 3 nodepools of v4-16 in us-central2-b. The end state of the cluster will be 3 nodepools of v4-16 in us-central2-b.
Scenario 5: Cluster already has 2 nodepools of v4-8 in us-central2-b and now we request 2 nodepools of v4-8 in us-central2-a. XPK will fail early and will not update the cluster. The end state of the cluster will be 2 nodepools of v4-8 in us-central2-b.