issues
search
GoogleCloudPlatform
/
container-engine-accelerators
Collection of tools and examples for managing Accelerated workloads in Kubernetes Engine
Apache License 2.0
214
stars
151
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Using GPUDirect and NCCL with torch 2.5 (nvidia-nccl-cu12 2.21.5)
#414
danielkovtun
opened
1 day ago
0
Add tcpxo daemon anti affinity to avoid duplicate scheduling
#413
thisSIDEofRANDOM
closed
1 day ago
1
Remove grpcServer.Stop() when stream.Send() fails on ListAndWatch
#412
syaganti
opened
2 weeks ago
2
Update guest version of TCPXO
#411
grac3gao
closed
2 weeks ago
0
Bump cos-cloud/cos-gpu-installer from `8d86a65` to `af09af5`
#410
dependabot[bot]
opened
4 weeks ago
0
Add autopilot yamls
#409
jtechapps
closed
3 weeks ago
0
Flip version
#408
grac3gao
closed
1 month ago
0
`nvidia-device-plugin` failed to run on GPU nodes created by Node Auto-Provisioning
#407
hongchaodeng
opened
1 month ago
0
Bump cos-cloud/cos-gpu-installer from `8d86a65` to `4822571`
#406
dependabot[bot]
closed
4 weeks ago
1
Update TCPXO versions
#405
grac3gao
closed
1 month ago
0
Fix two bug related to metrics
#404
grac3gao
closed
2 months ago
0
Update restart logic for kubelet restart
#403
grac3gao
closed
2 months ago
0
Node Auto-Provisioning failing for certain GPU nodes (T4)
#402
agam
opened
2 months ago
3
Bump golang from 1.22-bullseye to 1.23-bullseye in /partition_gpu
#401
dependabot[bot]
opened
2 months ago
0
Bump golang from 1.22-bullseye to 1.23-bullseye
#400
dependabot[bot]
opened
2 months ago
0
Add partition and MIG profiles for H200
#399
aston-github
closed
2 months ago
0
Bump nvidia/cuda from 11.0-devel-ubuntu18.04 to 12.1.0-devel-ubuntu18.04 in /demo/gpu-error/illegal-memory-access
#398
dependabot[bot]
opened
2 months ago
0
Bump cos-cloud/cos-gpu-installer from `8d86a65` to `00bf251`
#397
dependabot[bot]
closed
1 month ago
1
Bump ubuntu from 18.04 to 24.04 in /nvidia-driver-installer/minikube
#396
dependabot[bot]
opened
2 months ago
0
Update manifests with new guest release
#395
grac3gao
closed
3 months ago
0
Update Dependabot to catch vulnerabilities in Dockerfile base images.
#394
aston-github
closed
2 months ago
0
Upgrade fast-socket base image
#393
Insufficient-Charge
closed
3 months ago
0
Upgrade fast-socket base image
#392
Insufficient-Charge
closed
3 months ago
0
Update cap setup
#391
grac3gao
closed
3 months ago
0
Change for proper behavior of cos gpu driver install
#390
Insufficient-Charge
closed
3 months ago
1
Add latest nccl-test manifest
#389
grac3gao
closed
4 months ago
0
Rename TCPXO files
#388
grac3gao
closed
4 months ago
0
Update tcpxo manifest
#387
grac3gao
closed
4 months ago
0
Add aperture setups
#386
grac3gao
closed
4 months ago
0
update pause image
#385
Dragoncell
closed
4 months ago
3
Update label-nodes-daemon.yaml
#384
thisSIDEofRANDOM
closed
4 months ago
0
Update label-nodes-daemon.yaml
#383
thisSIDEofRANDOM
closed
4 months ago
0
Nvidia Driver Public Bucket returning 404 - breaking ALL driver installation
#382
tvildo
opened
5 months ago
0
Update example with new scripts
#381
grac3gao
closed
5 months ago
0
Revert "Update nccl-test.yaml"
#380
thisSIDEofRANDOM
closed
5 months ago
0
Update nccl-test.yaml
#379
thisSIDEofRANDOM
closed
5 months ago
0
Initial topology Scheduling for tcpxo
#378
thisSIDEofRANDOM
closed
5 months ago
0
Update NRI device injector manifest
#377
grac3gao
closed
5 months ago
0
Update NRI device injector manifest
#376
grac3gao
closed
5 months ago
0
Update nccl and rxdm version
#375
grac3gao
closed
5 months ago
0
Update tcpx manifests
#374
grac3gao
closed
5 months ago
0
Update nccl-test-without-hostnetwork.yaml
#373
grac3gao
closed
6 months ago
0
Update nccl-test.yaml
#372
grac3gao
closed
6 months ago
0
Add logs for device injector
#371
Jiaqicao257
closed
5 months ago
0
Add R535 driver installer manifest for Ubuntu OS
#370
Jiaqicao257
closed
6 months ago
0
Add mknod tests for NRI device injector
#369
Jiaqicao257
closed
6 months ago
0
Update device injector manifest to mount dev path
#368
Jiaqicao257
closed
6 months ago
0
Configure Renovate
#367
renovate-bot
opened
6 months ago
0
Update TCPXO manifests
#366
grac3gao
closed
6 months ago
0
Bump golang.org/x/net from 0.17.0 to 0.23.0
#365
dependabot[bot]
closed
5 months ago
0
Next