AI-Hypercomputer / xpk

xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.
Apache License 2.0
81 stars 23 forks source link

Fix GKE node version selection #189

Closed 44past4 closed 1 month ago

44past4 commented 1 month ago

Fixes / Features

Testing / Documentation

Tested manually

Obliviour commented 1 month ago

Needs to pass the linter - https://github.com/AI-Hypercomputer/xpk/blob/main/docs/contributing.md#steps for steps

PBundyra commented 1 month ago

LGTM, Is it possible that the rapid release will not have a release of the same major version or lower? The current default rapid release version is 1.31 and it is currently and has support for versions old as 1.27x. 1.27x was introduced in May 2023. So I am guessimating that a major version will be in the rapid release for 1.5 years. If a user doesn't upgrade their cluster for that long then, this logic may break again.

Hi Victor, thanks for pointing this out. However, I believe given the rapid nature of XPK we can for now support only minor versions which have rapid release.