issues
search
NVIDIA
/
gpu-operator
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Apache License 2.0
1.77k
stars
286
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
GPU already used, showing up in multiple containers
#1021
astranero
opened
1 day ago
0
403 Unauthorized for helm image
#1020
AriBerisha
closed
1 day ago
1
set RUNTIME_CONFIG and RUNTIME_SOCKET envars to support new toolkit versions
#1019
tariq1890
closed
1 day ago
0
vGPU pods stuck/fail after the installation
#1018
tunahanertekin
opened
4 days ago
0
added runtimeClassName to fix Cuda version error on gpu-pod.yaml test
#1017
armagankaratosun
opened
4 days ago
0
Nvidia-driver-daemonset stuck in CrashLoopBackOff
#1016
CarlGJ
opened
4 days ago
0
failed to create NVIDIA device nodes
#1015
dstrbad
opened
4 days ago
7
Bump github.com/NVIDIA/nvidia-container-toolkit from 1.16.1 to 1.16.2
#1014
dependabot[bot]
closed
5 days ago
0
Bump github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring from 0.76.2 to 0.77.1
#1013
dependabot[bot]
opened
5 days ago
0
Helm release for v24.6.2
#1012
cdesiniotis
closed
6 days ago
0
Cherry-picks for 24.6.2
#1011
cdesiniotis
closed
6 days ago
0
Bump github.com/mittwald/go-helm-client from 0.12.13 to 0.12.14
#1010
dependabot[bot]
closed
5 days ago
0
Verification of Kubernetes compatibility
#1009
BrianV801
opened
1 week ago
0
Bump project version to 24.6.2
#1008
cdesiniotis
closed
1 week ago
0
Support the `DevicePluginCDIDevices` feature gate
#1007
jfroy
opened
1 week ago
1
vsphere e2e tests setup
#1006
shivakunv
opened
1 week ago
0
fix govet issues and pin golangci-lint version
#1005
tariq1890
closed
1 week ago
0
[release-24.6] bump cuda base images to fix CVE 2024-6345
#1004
tariq1890
closed
1 week ago
3
bump dcgm and dcg-exporter to versions 3.3.8 and 3.3.8-3.6.0
#1003
tariq1890
closed
1 week ago
0
Not able to view Gpu utilization metrics in openshift dashboard
#1002
umeshvw
opened
1 week ago
2
Bump github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring from 0.76.2 to 0.77.0
#1001
dependabot[bot]
closed
5 days ago
1
DriverToolkit is enabled in the GPU Operator ClusterPolicy, but the NFD version deployed in the cluster is too old to support it.
#1000
CarlGJ
closed
6 days ago
0
drop dist tag suffix when referencing images in scan and sign jobs
#999
tariq1890
closed
1 week ago
0
Bump github.com/prometheus/client_golang from 1.20.3 to 1.20.4
#998
dependabot[bot]
opened
1 week ago
0
add gpu driver container 550.90.12
#997
tariq1890
closed
1 week ago
0
[nvidia-ci] drop dist tag suffix when cloning ghcr.io images
#996
tariq1890
closed
1 week ago
0
Bump nvidia/cuda from 12.6.0-base-ubi9 to 12.6.1-base-ubi9 in /docker
#995
dependabot[bot]
closed
2 weeks ago
0
Bump nvidia/cuda from 12.6.0-base-ubi9 to 12.6.1-base-ubi9 in /validator
#994
dependabot[bot]
closed
2 weeks ago
0
downgrade go from 1.23.0 to 1.22.7
#993
tariq1890
closed
1 week ago
0
Following gpu-operator documentation will break RKE2 cluster after reboot
#992
aiicore
opened
2 weeks ago
4
containerd restart from nvidia-container-toolkit causes other daemonsets to get stuck
#991
chiragjn
opened
2 weeks ago
0
Fatal Error: Openshift 4.16.10 not compatible with Nvidia-GPU-Operator-24.6.1
#990
jayteaftw
closed
1 week ago
12
Bump sigs.k8s.io/controller-tools from 0.16.2 to 0.16.3 in /tools
#989
dependabot[bot]
opened
2 weeks ago
0
Bump k8s.io/code-generator from 0.31.0 to 0.31.1 in /tools
#988
dependabot[bot]
closed
5 days ago
0
Bump the k8sio group with 4 updates
#987
dependabot[bot]
closed
5 days ago
2
[RBAC cleanup] move namespaced resources to Role from ClusterRole
#986
tariq1890
closed
2 weeks ago
1
Bump github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring from 0.76.1 to 0.76.2
#985
dependabot[bot]
closed
2 weeks ago
0
configure crun as the low-level runtime to prioritise when using CRI-O
#984
tariq1890
closed
1 week ago
0
DCGM_FI_DEV_GPU_UTIL metric giving empty value from prometheus
#983
Vijaygawate
opened
3 weeks ago
0
nvidia.com/gpu.deploy.driver label is not pre-installed
#982
lengrongfu
opened
3 weeks ago
0
How to use GPU Operator with MIG to configure 2 GPUs on one node separately
#981
marlowsw
opened
3 weeks ago
0
helm instal gpu-operator was in Init stage for a long time
#980
JShuang7711
opened
3 weeks ago
1
update K8s version used by holodeck to v1.31
#979
tariq1890
closed
3 weeks ago
0
disable privileged mode for toolkit-validation init containers
#978
tariq1890
closed
3 weeks ago
0
Bump github.com/prometheus/client_golang from 1.20.2 to 1.20.3
#977
dependabot[bot]
closed
3 weeks ago
0
add gpu driver 560.35.03
#976
tariq1890
closed
3 weeks ago
0
Bump golang.org/x/mod from 0.20.0 to 0.21.0
#975
dependabot[bot]
closed
3 weeks ago
0
Add validate nouveau whether in blacklist
#974
lengrongfu
opened
3 weeks ago
0
Bump github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring from 0.76.0 to 0.76.1
#973
dependabot[bot]
closed
3 weeks ago
0
Bump NVIDIA/holodeck from 0.2.1 to 0.2.4
#972
dependabot[bot]
opened
4 weeks ago
1
Next