issues
search
AI-Hypercomputer
/
xpk
xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.
Apache License 2.0
83
stars
27
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add lint and format commands on makefile.
#281
mbobrovskyi
opened
5 hours ago
1
Warn on deleting cluster with a user prompt.
#280
mbobrovskyi
opened
7 hours ago
1
Add required flag --ray-version to README example
#279
mbzomowski
opened
14 hours ago
1
Ray-version readme fix
#278
mbzomowski
closed
14 hours ago
0
Update kjob installation guide link.
#277
mbobrovskyi
closed
8 hours ago
1
Storage delete fs pv pvc
#276
BluValor
opened
1 day ago
0
Fix storage example typo
#275
BluValor
closed
1 day ago
0
Add troubleshooting info
#274
BluValor
opened
1 day ago
0
Update README.md in a fork branch
#273
BluValor-2
opened
2 days ago
1
Create new Filestore instance
#272
pawloch00
opened
5 days ago
0
Support --time flag on batch command.
#271
mbobrovskyi
closed
5 days ago
1
Pr example branch
#270
BluValor
closed
2 days ago
0
Add module to support deployment of cluster toolkit module
#269
pawloch00
opened
6 days ago
0
Add script name to job info command
#268
BluValor
closed
1 day ago
0
Create integration test using Kind
#267
IrvingMg
opened
1 week ago
0
Storage management documentation update
#266
BluValor
closed
6 days ago
0
Set cluster to apply job command on
#265
IrvingMg
closed
1 week ago
1
Fix duplicate definition of JOBSET_NAME
#264
frgossen
closed
2 weeks ago
0
Fix tests on development branch
#263
pawloch00
closed
1 week ago
0
Remove Slurm term from logs for `job ls`
#262
IrvingMg
closed
2 weeks ago
2
Optimize Makefile targets.
#261
mbobrovskyi
closed
2 weeks ago
0
Support darwin platform on Makefile.
#260
mbobrovskyi
closed
2 weeks ago
0
Disable node autoupgrade
#259
pawloch00
closed
2 weeks ago
0
Remove unused download_crd_file_urls function.
#258
mbobrovskyi
closed
2 weeks ago
0
Test nightly
#257
mbobrovskyi
closed
2 weeks ago
0
Bump Kueue version from 0.8.1 to 0.9.1
#256
mbobrovskyi
closed
5 days ago
9
Fix duplicate definition of JOBSET_NAME
#255
frgossen
closed
2 weeks ago
2
Don't retry on fail in wait_for_kueue_available.
#254
mbobrovskyi
closed
2 weeks ago
3
Fix nightly tests
#253
pawloch00
closed
1 week ago
0
Implement docker module
#252
pawloch00
closed
2 weeks ago
0
workflow concurrency change check
#251
BluValor
closed
2 weeks ago
0
Concurrent workflows for different PRs
#250
BluValor
closed
2 weeks ago
0
Add custom pw proxy args support
#249
sadikneipp
opened
2 weeks ago
1
Add job cancel
#248
IrvingMg
closed
2 weeks ago
2
Add unit tests to gh pipeline
#247
pawloch00
closed
2 weeks ago
0
Implement blueprint module
#246
pawloch00
closed
2 weeks ago
1
Add the option to use RAMDisk in workloads
#245
xuefgu
closed
2 weeks ago
2
Add job list command
#244
IrvingMg
closed
2 weeks ago
6
Add Makefile for installing dependencies
#243
pawloch00
closed
2 weeks ago
2
Integrate kind for local testing
#242
IrvingMg
closed
1 week ago
7
Disabling CloudDNS upgrades, while ensuring backward compatibility.
#241
RoshaniN
closed
3 weeks ago
0
Ppawl test gh pip
#240
pawloch00
closed
3 weeks ago
0
Test main
#239
pawloch00
closed
3 weeks ago
0
Fix pytype
#238
pawloch00
closed
3 weeks ago
0
Ignore .idea paraphernalia.
#237
mbobrovskyi
closed
3 weeks ago
0
Add flags on batch command.
#236
mbobrovskyi
closed
6 days ago
2
Wait for kueue to be available before install custom resources.
#235
mbobrovskyi
closed
3 weeks ago
0
Use kjobctl printcrd command to apply CRDs.
#234
mbobrovskyi
closed
3 weeks ago
0
Add Cloud Platform scope to NodePool Creation by default on Pathways + CPU
#233
SujeethJinesh
closed
2 weeks ago
3
Support gke version to be independent of release type (rapid)
#232
Obliviour
closed
2 weeks ago
0
Next