issues
search
AI-Hypercomputer
/
xpk
xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.
Apache License 2.0
81
stars
23
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add create-ray-cluster subcommand
#202
mbzomowski
opened
1 month ago
0
Job info command implementation
#201
BluValor
opened
1 month ago
2
Implement info command
#200
pawloch00
closed
3 weeks ago
0
Add GCS Fuse Storage to workload creation
#199
PBundyra
closed
3 weeks ago
0
Implement info command
#198
pawloch00
closed
1 month ago
0
Update Pathways-on-Cloud flags
#197
lukebaumann
closed
1 month ago
0
Bump Kueue version to v0.8.1
#196
PBundyra
closed
1 month ago
0
Add v6e for Pathways
#195
guptaaka
closed
1 month ago
0
Update the CloudDNS check.
#194
lukebaumann
closed
1 month ago
0
Add a troubleshooting tip
#193
guptaaka
closed
1 month ago
0
Introduce Storage API
#192
PBundyra
closed
3 weeks ago
1
Consider configuring kueue waitForPodsReady
#191
avrittrohwer
opened
1 month ago
3
Support setting node auto-provisioning cpu and memory parameters
#190
avrittrohwer
opened
1 month ago
0
Fix GKE node version selection
#189
44past4
closed
1 month ago
2
Fix GKE node version selection
#188
44past4
closed
1 month ago
0
Fix autoprovisioning with spot nodes
#187
avrittrohwer
opened
1 month ago
1
Fix autoprovisioning with spot nodes
#186
avrittrohwer
closed
1 month ago
1
Fix GKE node version selection logic
#185
44past4
closed
1 month ago
1
better core dump for debugging
#184
ZhiyuLi-goog
opened
2 months ago
0
Fix debug logging (--enable-debug-logs)
#183
Obliviour
closed
2 months ago
0
Fixes a typo in the base command description
#182
lukebaumann
closed
2 months ago
0
Fix debug logging
#181
Obliviour
closed
2 months ago
1
Add Zarr Flag for Pathways
#180
SujeethJinesh
closed
2 months ago
1
Trillium device support
#179
Obliviour
closed
2 months ago
1
Added advanced usage example for a notebook interacting with a Cloud …
#178
nhira
closed
2 months ago
0
Enabling Workload Identity and GCSFuse driver flags added.
#177
sharabiani
closed
2 months ago
1
Pbundyra refactor commands
#176
PBundyra
closed
2 months ago
0
Create `commands` package and `core/` modules for NAP, Kueue and Pathways
#175
PBundyra
closed
2 months ago
0
Add quotes even to example output to help devs who copy commands from…
#174
nhira
closed
3 months ago
0
Add quotes even in example output to help devs who copy commands from the example output comments
#173
nhira
closed
3 months ago
0
Update RxDM image version from v1.0.8 to v1.0.9.
#172
yangyuwei
closed
3 months ago
0
Update RxDM image version from v1.0.8 to v1.0.9.
#171
yangyuwei
closed
3 months ago
1
Move SystemCharacteristics to a separate module
#170
PBundyra
closed
2 months ago
0
Allow debug_dump_gcs to be specified with other XLA_FLAGS
#169
jonb377
opened
3 months ago
0
Create `parser` package. Move logic from `xpk.py` to `parser` package.
#168
PBundyra
closed
3 months ago
1
Fix issue with device check failure
#167
jonb377
closed
3 months ago
1
Create `xpk` package with `utils` module
#166
PBundyra
closed
3 months ago
3
Create xpk package, utils module and refactor
#165
PBundyra
closed
2 months ago
0
Enabling Workload Identity and GCSFuse driver flags
#164
sharabiani
closed
2 months ago
0
Python3.10 fix - use CSV format for gcloud commands to simplify parsing
#163
nhira
closed
3 months ago
1
Allow SIGTERM error code to be returned from XPK
#162
Obliviour
closed
3 months ago
0
Create cluster from several reservations
#161
DwarKapex
opened
3 months ago
1
Fix non-accelerator pools from being part of accelerator node pool cr…
#160
Obliviour
closed
2 months ago
0
Use csv formatting instead in the gcloud command to split the names o…
#159
Obliviour
closed
3 months ago
1
xpk Cluster Queue resource group "cpu" resource quota incorrect for a CPU-only cluster
#158
bernardhan33
opened
4 months ago
11
Correct Suspend/Resume backoffLimit for Pathways
#157
SujeethJinesh
closed
4 months ago
3
Remove flag `pathways_compilation_mode` from xpk.py
#156
norx1991
closed
4 months ago
0
Remove incorrect plural from filter-by-job
#155
Obliviour
closed
5 months ago
0
Update XPK to support topology-aware scheduler for GPU workloads.
#154
yangyuwei
closed
4 months ago
1
Update the CloudDNS check.
#153
lukebaumann
closed
1 month ago
3
Previous
Next