AI-Hypercomputer / xpk

xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.
Apache License 2.0
81 stars 23 forks source link

Don't Kill RM or Proxy on user job failure #146

Open SujeethJinesh opened 6 months ago

SujeethJinesh commented 6 months ago

Fixes / Features

-

Testing / Documentation

Testing details.