AI-Hypercomputer / xpk

xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.
Apache License 2.0
70 stars 18 forks source link

better core dump for debugging #184

Open ZhiyuLi-goog opened 2 weeks ago

ZhiyuLi-goog commented 2 weeks ago

Features

upload more detailed info to gcs bucket in failures

Switch to gcloud storage cp, since it around twice as fast as gsutil -m cp.

Testing / Documentation