issues
search
NVIDIA
/
gpu-operator
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Apache License 2.0
1.77k
stars
286
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
GET https://helm.ngc.nvidia.com/nvidia returns 401
#921
cwyl02
closed
1 month ago
11
Got nccl error when deploy vllm in k8s with multiple GPUs
#920
ZhaoGuoXin
opened
1 month ago
0
Bump nvidia/cuda from 12.5.1-base-ubi8 to 12.6.0-base-ubi8 in /validator
#919
dependabot[bot]
closed
1 month ago
0
Nvidia driver install fails on pod nvidia-driver-daemonset - OpenShift 4.13
#918
kail-x-y
opened
1 month ago
0
Bump nvidia/cuda from 12.5.1-base-ubi8 to 12.6.0-base-ubi8 in /docker
#917
dependabot[bot]
closed
1 month ago
0
add the v24.6.1 OLM bundle
#916
tariq1890
closed
1 month ago
0
add the v24.6.1 OLM bundle
#915
tariq1890
closed
1 month ago
0
gpu-operator executable is not on $PATH
#914
ashvin-pidaparti
closed
4 days ago
7
Helm release for gpu-operator v24.6.1
#913
tariq1890
closed
1 month ago
0
Helm release for v24.6.1
#912
tariq1890
closed
1 month ago
0
Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.20.0
#911
dependabot[bot]
closed
1 month ago
1
usePrecompiled and new versions
#910
easyrider14
opened
1 month ago
1
Bump NVIDIA/holodeck from 0.2.1 to 0.2.2
#909
dependabot[bot]
closed
1 month ago
1
Update OLM bundle to use staging images built from release branch
#908
cdesiniotis
opened
1 month ago
0
Cherry picks for 24.6.1 release
#907
cdesiniotis
closed
1 month ago
0
Bump project version to v24.6.1
#906
cdesiniotis
closed
1 month ago
0
Bump device-plugin to v0.16.2
#905
cdesiniotis
closed
1 month ago
0
node-feature-discovery of gpu-operator sends excessive LIST requests to the API server
#904
jslouisyou
opened
1 month ago
3
[cherrypick][release-24.6][H100 NVL]update all-balanced MIG config
#903
tariq1890
closed
1 month ago
0
GPU pods end up in CrashLoopBackoff state after eviction
#902
futurwasfree
opened
1 month ago
2
AUTO_UPGRADE_POLICY_ENABLED set to true, but eviction and drain are "disabled by the upgrade policy"
#901
futurwasfree
opened
1 month ago
2
[H100 NVL]update all-balanced MIG config
#900
tariq1890
closed
1 month ago
1
Bump github.com/regclient/regclient from 0.7.0 to 0.7.1
#899
dependabot[bot]
closed
1 month ago
0
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown
#898
khaykingleb
closed
1 month ago
2
Question: how to select an specific GPU type? Can I use a different name for the resources, other than `nvidia.com/gpu`?
#897
abravalheri
closed
1 month ago
2
H100 is not supported by nvidia driver?
#896
ch3nku1
opened
1 month ago
1
Bump github.com/onsi/gomega from 1.33.1 to 1.34.1
#895
dependabot[bot]
closed
1 month ago
0
Cherry-pick fixes for OpenShift
#894
cdesiniotis
closed
1 month ago
0
[node-status-exporter] fix bug in retrieving the nvidia-driver-daemonset
#893
tariq1890
closed
1 month ago
0
Alert GPUOperatorNodeDeploymentDriverFailed constantly fires on OpenShift, even when driver deployment appears successful in 24.6.0
#892
benhwebster
closed
1 month ago
3
change mig customer-mig-parted-config configmap but mig-manager use config is not the updated data
#891
lengrongfu
opened
2 months ago
4
controller-runtime cache should only list-watch resources in the operator namespace
#890
tariq1890
closed
2 months ago
0
[release-24.6][cherrypick] Add OLM bundle for 24.6.0
#889
cdesiniotis
closed
2 months ago
0
[release-24.6][cherrypick] add RHOCP certified v24.3.0 OLM bundle
#888
tariq1890
closed
2 months ago
0
[release-24.3][cherrypick] add RHOCP certified v24.3.0 OLM bundle
#887
tariq1890
closed
2 months ago
0
add RHOCP certified v24.6.0 OLM bundle
#886
tariq1890
closed
2 months ago
2
add RHOCP certified v24.3.0 OLM bundle
#885
tariq1890
closed
2 months ago
0
Bump github.com/docker/docker from 25.0.5+incompatible to 25.0.6+incompatible
#884
dependabot[bot]
closed
1 month ago
6
After upgrading from 24.3.0 to 24.6.0 via OLM, the operator appears to be missing expected permissions on configmaps
#883
benhwebster
closed
1 month ago
15
[ci] move IN_REGISTRY definition for validator to a template
#882
cdesiniotis
closed
2 months ago
0
Add OLM bundle for 24.6.0
#881
cdesiniotis
closed
2 months ago
1
Add unit tests for transforms
#880
cdesiniotis
closed
2 months ago
0
Bump github.com/docker/docker from 25.0.5+incompatible to 26.1.4+incompatible
#879
dependabot[bot]
closed
2 months ago
1
update prometheus-operator to version v0.75.2
#878
ajayk
closed
1 month ago
7
Helm release for v24.6.0
#877
tariq1890
closed
2 months ago
0
Fix pull of github staging image for validator
#876
cdesiniotis
closed
2 months ago
0
Use GitHub image as staging image
#875
cdesiniotis
closed
2 months ago
0
Update must-gather.sh URL in github issue template
#874
cdesiniotis
closed
2 months ago
1
gpu-operator with MIG won't work if GPU Node is deleted from cluster, reprovisioned, and then re-joined with the same name
#873
rpardini
opened
2 months ago
1
[question] Is there a reason for the excessive node labelling"
#872
milosgajdos
opened
2 months ago
0
Previous
Next